Bilateral Co-authorship Indicators Based on Fractional Counting

It is well-known that collaborations between scientists, and consequently between institutes, countries, and sectors have increased considerable over the latest decades (Glänzel & Schubert, 2004; Wuchty, Jones, & Uzzi, 2007). Yet, the problem of how to measure collaboration has not been solved yet. It is customary to operationalize the notion of collaboration through the notion of co-authorship, although colleagues have pointed out that this is not quite accurate (Katz & Martin, 1997). We recall that Sonnenwald (2007) defines research collaboration as “the interaction taking place within a social context among two or more scientists that facilitates the sharing of meaning and completion of tasks with respect to a mutually shared, superordinate goal.”

In this short contribution we will not try to provide an overview of the many publications related to scientific collaboration, invisible colleges, interdisciplinarity, and team science, referring the interested reader to the following publications (Liu et al., 2020; Rousseau, Zhang, & Hu, 2019; Sonnenwald, 2007; Wagner et al., 2011), among others. Instead we come straight to the point and provide an indicator for co-authorship that we think is more refined than existing ones, at least in a certain context (explained further on), mainly because it takes fractional counting into account, and this in a subtle way.

Introducing a framework

We first note that it is assumed that we work with a fixed database D and consider publications published during a fixed publication window.

Although country co-authorship is typically a network property, this aspect may be used in an explicit way, by drawing actual co-authorship networks and using network indicators, or it may be downplayed e.g. by just providing a ranked list of the countries with which a given country collaborates or has collaborated in the past. Moreover, this network may be weighted or not, and if weighted one may use natural numbers (whole counting) or any positive real number (different variations on fractional counting). Most, but not all, of the older investigations used whole counting and did not use other network indicators besides degree centrality in an undirected network, (Frame & Carpenter, 1979; Luukkonen, Persson, & Sivertsen, 1992; Luukkonen et at., 1993; Narin, Stevens, & Whitlow, 1991; Russell, 1995; Schubert & Braun, 1990).

A recent article by Perianes-Rodriguez, Waltman, and van Eck (2016) provides a framework for—at least some—co-authorship studies. These authors made a clear difference between indicators for a phenomenon (here collaboration between countries as measured through joint publications) and indicators for the co-authorship network. Admitting that the network point of view is often a valid one (Leydesdorff & Park, 2017; Park, Yoon, & Leydesdorff, 2016) we nevertheless focus on finding an indicator for collaboration between two entities without taking the network aspect into account. Before continuing we point out that there is still another point of view in co-authorship studies. One may try to find an indicator for the global “collaborativeness” of a set of publications, starting from the simple percentage of co-authored publications in the set to the advanced Egghe-English approach (Egghe, 1991; English, 1991; Rousseau, 2011). As said above, our article focuses on an indicator for co-authorship between two entities.

2.1

First step: counting publications on country level

2.1.1 Full counting for country C. When using full counting the production of country C (misusing the term production for the number of publications as given by a counting scheme) is set equal to the number of publications in which at least one author has at least one address in country C.

2.1.2 (Basic) fractional counting: if at least one of the authors of a publication has at least one address in country C then this publication contributes to the publication score of country C. Assuming that each author has exactly one address (or all addresses of this author are in the same country), then the contribution of country C in a publication with N authors is p_C = AU/N, where AU is the number of authors with all addresses in country C.

It may, however, happen that some authors have addresses in more than one country. Then p_C must be adapted. We proceed as follows: if AU authors with at least one address in country C contribute to this N-author publication, then the fractional contribution of country C to this publication is (1) $p_{C} = \frac{1}{N} \sum_{j = 1}^{AU} \frac{a_{j}}{a_{j} + b_{j}},$ {p_C} = {1 \over N}\sum\nolimits_{j = 1}^{AU} {{{{a_j}} \over {{a_j} + {b_j}}}} , where a_j is the number of addresses of author j in country C, and b_j is the number of addresses of author j not situated in country C. It is clear that, allowing b_j = 0, formula (1) is always valid, whether or not author j has addresses outside country C or not.

If p_K represents the fractional contribution of a country K in a fixed article then Σ_K:countryp_K = 1.

Finally, the fractional production of country C in the set under investigation is the sum of all fractional scores of all publications involving country C.

2.2

Country co-authorship indicators

2.2.1

Classical, full counting approach (absolute values)

Let A be the set of all articles in database D with at least one author with an address in country C₁ and let #A be the number of articles in A.

Let B be the set of all articles in database D with at least one author with an address in country C₂ and let #B be the number of articles in B.

Then A∩B is the set of all articles with at least one address in country C₁ and at least one address in country C₂; #(A∩B) is then the absolute value of the co-authorship indicator (co-authorship between countries C₁ and C₂) using full counting. We denote this indicator as FCCI(C₁,C₂).

2.2.2

Classical, full counting approach (relative values)

A more interesting indicator is #(A∩B)/#A = q_C₁. This is the percentage of C₁–C₂ co-authorships (in the sense of classical whole counting) among all C₁-publications. Yet, assuming that #(A∩B) stays constant then q_C₁ can decrease because country C₁ works more on its own, or because country C₁ collaborates more with other countries. So, on its own this indicator cannot be interpreted correctly. More information is necessary. Similarly, and of equal interest, is #(A∩B)/#B = q_C₂ the percentage of C₁–C₂ co-authorships among all C₂-publications.

A new approach that takes fractions into account

3.1

A co-authorship score of two different countries

We start from the idea that the most intense co-authorship pattern between two countries in one article occurs when the two countries contribute equally (the perfectly balanced case) and no other country contributes. We give this situation a score of 1. In the purely theoretical case that two countries never publish on their own, always publish co-authored publications, and this always in a perfectly balanced way, i.e. p_C₁ = p_C₂, (here and further on p-values are calculated as proposed in equation (1)) then their total co-authorship score is equal to their total number of publications. We denote the co-authorship score of two different countries C₁ and C₂ in one publication by cs(C₁,C₂).

Definition: co-authorship score of two different countries C₁ and C₂ in one publication.

(2)

cs (C_{1}, C_{2}) = \frac{4}{\frac{1}{p_{C_{1}}} + \frac{1}{p_{C_{2}}}} = \frac{4 p_{C_{1}} p_{C_{2}}}{p_{C_{1}} + p_{C_{2}}}

cs\left( {{C_1},{C_2}} \right) = {4 \over {{1 \over {{p_{{C_1}}}}} + {1 \over {{p_{{C_2}}}}}}} = {{4{p_{{C_1}}}{p_{{C_2}}}} \over {{p_{{C_1}}} + {p_{{C_2}}}}}

The formula for cs is two times the harmonic mean of the fractional contributions of countries C₁ and C₂. We multiply the harmonic mean by two so that in the perfectly balanced case cs(C₁,C₂) receives a score of 1 (and not 0.5). This is just a practical agreement and not essential for comparing scores over time or for different countries. If p_C₁ + p_C₂ = 1, then cs (C₁, C₂) = 4p_C₁p_C₂.

Definition: co-authorship score, cs, of two different countries C₁ and C₂ for a given set of publications.

If PUB is the number of publications under consideration, then the co-authorship score cs (C₁,C₂) is defined as: (3) $cs (C_{1}, C_{2}) = \sum_{j = 1}^{PUB} {cs}_{j} (C_{1}, C_{2})$ cs\left( {{C_1},{C_2}} \right) = \sum\nolimits_{j = 1}^{PUB} {c{s_j}\left( {{C_1},{C_2}} \right)}

3.2

Properties of the cs-indicator

We introduced the new cs-score without proper arguments. Next we show a list of good properties of cs. This list serves as argumentation for the introduction of this new cs-score. Indeed, we think that the properties shown in this list are necessary, or at least highly desirable for a proper cs-score. We first note that if a C₁–C₂ co-authored publication is added to the set under consideration then cs(C₁,C₂), equation (3), increases. So, besides an increase for the full counting score, also the fractional counting score increases.

The cs-value as defined in equation (2), i.e. for one publication, has the following properties.

P1) For all countries C₁, C₂ and for all publications: cs(C₁,C₂) = cs(C₂,C₁); cs is a symmetric measure. This symmetry property also holds for equation (3).

P2) For all countries C₁ and C₂ and for all publications: 0 ≤ cs(C₁,C₂) ≤ 1; cs is a non-negative measure, with upper bound equal to one.

P3) The co-authorship score cs(C₁,C₂) = 1 if and only if p_C₁ = p_C₂ = 0.5; this is the upper bound for the co-authorship score of two countries in one publication. This upper bound corresponds to the perfectly balanced case.

P4) If for countries C₁ and C₂: p_C₁ ≤ p_C₂, then for all countries C₃: cs(C₁,C₃) ≤ cs(C₂,C₃), which is a monotonicity property.

P5) The cs-value of countries C₁ and C₂ in one publication does not depend on how many (one, two or more) other countries and which other countries contribute to a joint publication. This is a form of anonymity property. Below we provide some comments on this property.

P6) Continuity

We can write cs(C₁,C₂) as $\frac{4 p_{C_{1}} p_{C_{2}}}{p_{C_{1}} + p_{C_{2}}}$ {{4{p_{{C_1}}}{p_{{C_2}}}} \over {{p_{{C_1}}} + {p_{{C_2}}}}} . Denoting the fractional contribution of other countries (i.e. not country C₁ or country C₂) by p_O, (O for “other”) we have that always p_C₁ + p_C₂ + p_O = 1. Hence, we can rewrite cs(C₁,C₂) as $\frac{4 p_{C_{1}} (1 - p_{C_{1}} - p_{O})}{1 - p_{O}}$ {{4{p_{{C_1}}}\left( {1 - {p_{{C_1}}} - {p_O}} \right)} \over {1 - {p_O}}} . This shows that cs(C₁,C₂) can be written as a function of two independent variables, either p_C₁ and p_C₂ or p_C₁ and p_O (and of course also as a function of p_C₂ and p_O) where these two variables are always between zero and one and their sum is smaller than or equal to one. Considering these variables as real variables, we see that cs is a continuous function of two variables. Roughly speaking this means that a small change in any of these variables leads to a small change in cs(C₁,C₂).

P7) Next we show that the more uneven the contribution of the two countries, the smaller their cs-value. Conversely, when the two countries have an equal contribution, then their cs-value reaches a maximum, depending on p_O.

Assume that p_O (< 1) is fixed, then cs(C₁,C₂) increases for $0 < p_{C_{1}} \leq \frac{1 - p_{O}}{2}$ {0 \lt p_{{C_1}}} \le {{1 - {p_O}} \over 2} and then decreases for $\frac{1 - p_{O}}{2} < p_{C_{1}} < 1 - p_{O}$ {{1 - {p_O}} \over 2} < {p_{{C_1}}} < 1 - {p_O} . Indeed: $cs (C_{1}, C_{2}) = \frac{4 p_{C_{1}} p_{C_{2}}}{p_{C_{1}} + p_{C_{2}}} = \frac{4 p_{C_{1}} (1 - p_{C_{1}} - p_{O})}{1 - p_{O}}$ cs\left( {{C_1},{C_2}} \right) = {{4{p_{{C_1}}}{p_{{C_2}}}} \over {{p_{{C_1}}} + {p_{{C_2}}}}} = {{4{p_{{C_1}}}\left( {1 - {p_{{C_1}}} - {p_O}} \right)} \over {1 - {p_O}}} . This is clearly a parabola in the variable p_C₁ with top in the point with coordinates $(\frac{1 - p_{O}}{2}, 1 - p_{O})$ \left( {{{1 - {p_O}} \over 2},1 - {p_O}} \right) . This top is reached when the contribution of the two countries is equal. In the special case that p_O = 0, this top is situated in the point (0.5, 1).

P8) In this last property we show that when the relative contributions of the two countries are given, then their cs-value depends on p_O: the larger p_O the smaller their cs-value.

In the previous property we kept p_O ≠ 0 fixed. Now we keep $P = \frac{p_{C_{1}}}{p_{C_{2}}} \leq 1$ P = {{{p_{{C_1}}}} \over {{p_{{C_2}}}}} \le 1 fixed and show that the larger p_O, the smaller cs(C₁,C₂), which is again a property one may expect to hold for a proper co-authorship measure for two countries.

Proof: $P = \frac{p_{C_{1}}}{p_{C_{2}}}$ P = {{{p_{{C_1}}}} \over {{p_{{C_2}}}}} implies that P*p_C₂ = p_C₂ and hence $cs (C_{1}, C_{2}) = \frac{4 p_{C_{1}} p_{C_{2}}}{p_{C_{1}} + p_{C_{1}}} = \frac{4 P {(p_{C_{2}})}^{2}}{p_{C_{2}} (1 + P)} = \frac{4 P * p_{C_{2}}}{1 + P}$ {\rm{cs}}\left( {{{\rm{C}}_1},{{\rm{C}}_2}} \right) = {{4{p_{{C_1}}}{p_{{C_2}}}} \over {{p_{{C_1}}} + {p_{{C_1}}}}} = {{4P{{\left( {{p_{{C_2}}}} \right)}^2}} \over {{p_{{C_2}}}\left( {1 + P} \right)}} = {{4P*{p_{{C_2}}}} \over {1 + P}} . As p_C₁ + p_C₂ +p_O = 1 we have here: (1+P)* p_C₂ + p_O = 1, or $p_{C_{2}} = \frac{1 - p_{O}}{1 + P}$ {p_{{C_2}}} = {{1 - {p_O}} \over {1 + P}} . Consequently, $cs (C_{1}, C_{2}) = \frac{4 P (1 - p_{O})}{{(1 + P)}^{2}}$ {\rm{cs}}\left( {{{\rm{C}}_1},{{\rm{C}}_2}} \right) = {{4P\left( {1 - {p_O}} \right)} \over {{{\left( {1 + P} \right)}^2}}} . This clearly shows that if P is fixed, cs decreases in p_O.

We note that if P = 1 (C₁ and C₂ have an equal contribution) cs(C₁,C₂) = 1-p_O.

3.3

A co-authorship intensity indicator

We propose the following indicator as a co-authorship intensity indicator, denoted as CI(C₁,C₂): (4) $CI (C_{1}, C_{2}) = \frac{cs (C_{1}, C_{2})}{FCCI (C_{1}, C_{2})}$ CI\left( {{C_1},{C_2}} \right) = {{cs\left( {{C_1},{C_2}} \right)} \over {FCCI\left( {{C_1},{C_2}} \right)}}

On the one hand, two countries may co-author often (FCCI is high), but this collaboration usually involves many other countries or is highly asymmetric (most authors belong to one country). Then the CI-index is small. When, on the other hand, scientists of two countries work together it is mostly without third party and in a balanced way. Then the intensity index is—relatively—high. We write “relatively” as it is obvious that in real situations CI will rarely be close to one.

Comments

We reconsider the anonymity property P5. Of course the cs-value depends on the fractions p_C₁, p_C₂ and p_O with p_C₁ + p_C₂ + p_O = 1, but we mean that the cs-value does not depend on how many other countries are involved, and certainly not on which countries these other countries are. We admit that depending on the type of investigation having this information might be useful. Yet, other types of indicators are needed for this.

Until now we have talked about countries. Yet, the formula for countries can also be used within a university, where the role of countries is played by departments or schools. Formulae (2) and (3) can even be used for scientists, especially if a scientist's contribution to an N-author publication is not necessary equal to 1/N but can be any number strictly larger than zero.

For each concrete investigation one must make a decision if two (or more) addresses in one university count for one or more. Maybe the decision can be based on the postal number, but sometimes the same building can host different administrative units. We do not go into these practical difficulties, but just mention them for completeness’ sake. In the real-world example shown further on, we use addresses as provided in the Web of Science (WoS).

Examples

5.1

Made-up data

These data are presented to illustrate numerical values resulting from the definition of this new indicator.

The last row of Table 1 illustrates the perfectly balanced case: the two countries contribute equally and no other country contributes, while the first row of Table 2 is an illustration of the case with a three-country collaborative publication, where the target two countries contribute equally (as are some other rows).

Table 1

Examples where no other countries are involved.

C₁	C₂	cs(C₁,C₂)
1/5	4/5	0.640
2/5	3/5	0.960
1/6	5/6	0.556
2/6	4/6	0.889
3/6	3/6	1.000

Table 2

Examples where other countries are involved.

C₁	C₂	Other	cs(C₁,C₂)
1/6	1/6	4/6	0.333
2/6	1/6	3/6	0.444
3/6	1/6	2/6	0.500
4/6	1/6	1/6	0.533
1/7	1/7	5/7	0.286
1/7	2/7	4/7	0.381
1/8	1/8	6/8	0.250
2/7	2/7	3/7	0.571
2/8	2/8	4/8	0.500
3/9	3/9	3/9	0.667
4/10	4/10	2/10	0.800

5.2

A small real-world example

As we do not have a program yet to do the complete data gathering we chose a small real-world example, doing all data gathering and calculations by hand and with the help of an Excel file. Data were collected from the Web of Science (WoS) as available at the KU Leuven, Belgium. We included the Proceedings but not the Book citation indexes. A search was performed on August 2, 2020 for CU=(Netherland* OR Holland) and CU=Belgium, within the two subject categories (SC) Mathematics and Applied Mathematics. This search was performed for PY=2016–2017 and for PY=2018–2019. Our aim is to find the cs- and CI-values for the pair Belgium-the Netherlands over these fields.

Fractional counting was performed as in formula (1). The number of authors (N) determines the basic fractions (each author receives a fraction 1/N for their country). If an author has several addresses then this fraction is further subdivided according to the number of addresses. If, for example, an author has five addresses, two in the Netherlands, one in Belgium, one in the United States and one in Germany, then their contribution to the Netherlands is 2/(5N), to Belgium it is 1/(5N) and to other countries it is 2/(5N). We used addresses as given in the WoS even if it was clear that they referred to the same physical space: clearly, the author played different roles (for different organizations) and it was considered important to mention this in the byline.

Results are given below, but numbers have only a limited importance. This example is just a small feasibility study. It provided us some experience with the practical difficulties in calculating cs-values. We note that a few articles were classified as Mathematics and also as Mathematics Applied: these are included twice. Moreover some articles published in 2020, but with an EA (Early Access Date) in 2019 were retrieved by PY=2019. This is according to the new way how PY= works in the WoS. These articles too are included.

In these fields, Belgium produced slightly more than the Netherlands. Joint work is increasing and is higher in applied mathematics than in (pure) mathematics. Yet, in all cases co-authorship between the two countries is low.

Table 3

Basic data: number of retrieved publications over the period 2016–2019 (full counting).

Year	Belgium	The Netherlands	Joint work	% of joint work : Belgium	% of joint work : Netherlands
Mathematics: 2016–2017	606	489	12	1.98	2.45
Mathematics: 2018–2019	588	558	14	2.38	2.51
Applied Mathematics: 2016–2017	632	586	21	3.32	3.58
Applied Mathematics: 2018–2019	615	631	24	3.90	3.80

Table 4

Total fractional contributions and cs-values.

Year	Average number of authors	Belgium	The Netherlands	Other countries	Total	Cs-values	Average cs-value per publication
Mathematics 2016–2017	2.58	4.292	4.708	3.000	12	8.882	0.740
Mathematics: 2018–2019	3.07	4.300	5.133	4.567	14	8.733	0.624
Applied mathematics: 2016–2017	4.52	9.514	6.394	5.092	21	13.962	0.665
Applied Mathematics: 2018–2019	4.58	7.400	8.200	8.401	24	14.146	0.589

The average number of authors per publication is—not surprisingly—higher in applied mathematics than in mathematics, and seems to increase (based on two periods only). Relative contribution by other countries seems to be on the rise too. Mathematics provides an example where the absolute number of co-authored articles increases, but the cs-value decreased. Yet, the general rule is that the more joint publications, the higher the co-authorship score. For the two fields the average cs-value per publication decreased over the two periods. Quite a lot of researchers have several addresses, often in different countries. As mathematics is a field where generally co-authorship is low, this shows already that data collection for fractional scores is not a sinecure.

Table 5 shows the calculation and resulting values for the co-authorship intensity of the two countries, in two fields and over two periods. For mathematics as well as for applied mathematics the CI-values decrease. As we have no experience with this new indicator we cannot discuss the meaning of the obtained values, yet the decreasing trend might not be surprising. Our indicators consist of two parts: a part directly depending on the bilateral relation between the two countries under study and a part related to the contribution of other countries. As international collaboration has increased over the years—and certainly within the European Union—a decrease of cs- and CI-values is to be expected.

Table 5

Co-authorship intensity values: CI (Belgium, the Netherlands).

Year	Mathematics: 2016–2017	Mathematics: 2018–2019	Applied mathematics: 2016–2017	Applied mathematics: 2018–2019
CI calculations	8.882/12	8.733/14	13.962/21	14.146/24
CI-value	0.74	0.62	0.66	0.59

Discussion

The introduction of these co-authorship indicators is just a start. It goes without saying that a next step must be a large scale application so that experience with the practical meaning of cs- and CI-values can be obtained. For instance, studying the Spearman correlation between rankings of a given country based on cs-values and of FCCI-values may be a first step. Finding out which of the collaborating countries change rankings over time and why might be a more interesting application.

Yet, much more can and should be done. We only studied bilateral relations. In this, the basic point of departure is that the most intense collaboration, as reflected though co-authorship, is the perfectly balanced case in which only the two countries under investigation collaborate and this in an equal way. Moreover, our approach is insensitive with respect to other countries. This immediately leads to the problem of finding indicators—within a fractional approach—of multilateral relations.

A reviewer rightly pointed out that, moreover, one should make a distinction between the measurement and corresponding indicators, of occurrence and the measurement of contribution. Reviewers further pointed out that country size may influence our indicators, suggesting size normalization as a possible solution.

Another reviewer pointed out that, in our example, a contribution of an author with addresses in different countries is counted as a collaboration between these countries. This observation is not directly related to the definition of our new indicators, but nevertheless in each concrete case a decision must be made if such a case counts as a collaboration or not. We are in favor of counting even a single-authored paper by an author with multiple addresses as an international collaboration. Indeed, as institutes in different countries hired this scientist a clear international link is present. Yet, this is only an opinion and an investigation if including such cases or not makes a difference, would be of interest. The result probably would depend on the discipline.

Finally, we note that, in this article, we did not try to gauge citation scores derived from co-authorship. This may add another layer of complexity (Smolinsky & Lercher, 2020).

Conclusion

By introducing two new indicators we add a new layer to the study of co-authorship and hence of collaboration. We note that these indicators for the co-authorship of two entities such as countries, do not take the network aspect into account (at least not in any direct way). These indicators use fractions, but no attempt is made to integrate so-called modified fractional counting (Sivertsen, Rousseau, & Zhang, 2019) into this approach. At the moment, we think that it is not obvious how to do this in a way that is clearly meaningful.

We proposed a co-authorship measure in the context of p_C₁ + p_C₂ + p_O = 1. Yet, one may imagine a generalization in which positive parameters α, β, γ are used, leading to αp_C₁ + βp_C₂ + γp_O = 1. We think though that such a generalization would make the basic theory needlessly complicated. Yet, such parameters may play a role in normalization attempts.

Finally, we pointed out a number of research problems which we intend to tackle in the near future.

eISSN:: 2543-683X
Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

RSS Feed de revista

Bilateral Co-authorship Indicators Based on Fractional Counting

Article Category: Research Paper

Publicado en línea: 23 oct 2020

Páginas: 1 - 12

Recibido: 07 ago 2020

Aceptado: 10 oct 2020

DOI: https://doi.org/10.2478/jdis-2021-0005

Palabras claveCollaboration, Country studies, Fractional counting, Harmonic mean, Co-authorship intensity

© 2021 Ronald Rousseau et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
Collaboration, Country studies, Fractional counting, Harmonic mean, Co-authorship intensity