In this contribution we provide two new co-authorship indicators based on fractional counting.

Based on the idea of fractional counting we reflect on what should be an acceptable indicator for co-authorship between two entities. From this reflection we propose an indicator, the co-authorship score, denoted as cs, using the harmonic mean. Dividing this new indicator by the classical co-authorship indicator based on full counting, leads to a co-authorship intensity indicator.

We show that the indicators we propose have many necessary or at least highly desirable properties for a proper cs-score. It is pointed out that the two new indicators can be used for countries, but also for institutions and other pairs of entities. A small example shows the feasibility of the co-authorship score and the co-authorship intensity indicator.

The indicators are not yet tested in real cases.

As the notions of co-authorship and collaboration have many aspects, we think that our contribution may help policy management to take yet another aspect into account as part of a multi-faceted description of research outcomes.

The indicators we propose cover yet another aspect of co-authorship.

#### Keywords

- Collaboration
- Country studies
- Fractional counting
- Harmonic mean
- Co-authorship intensity

It is well-known that collaborations between scientists, and consequently between institutes, countries, and sectors have increased considerable over the latest decades (Glänzel & Schubert, 2004; Wuchty, Jones, & Uzzi, 2007). Yet, the problem of how to measure collaboration has not been solved yet. It is customary to operationalize the notion of collaboration through the notion of co-authorship, although colleagues have pointed out that this is not quite accurate (Katz & Martin, 1997). We recall that Sonnenwald (2007) defines research collaboration as “the interaction taking place within a social context among two or more scientists that facilitates the sharing of meaning and completion of tasks with respect to a mutually shared, superordinate goal.”

In this short contribution we will not try to provide an overview of the many publications related to scientific collaboration, invisible colleges, interdisciplinarity, and team science, referring the interested reader to the following publications (Liu et al., 2020; Rousseau, Zhang, & Hu, 2019; Sonnenwald, 2007; Wagner et al., 2011), among others. Instead we come straight to the point and provide an indicator for co-authorship that we think is more refined than existing ones, at least in a certain context (explained further on), mainly because it takes fractional counting into account, and this in a subtle way.

We first note that it is assumed that we work with a fixed database D and consider publications published during a fixed publication window.

Although country co-authorship is typically a network property, this aspect may be used in an explicit way, by drawing actual co-authorship networks and using network indicators, or it may be downplayed e.g. by just providing a ranked list of the countries with which a given country collaborates or has collaborated in the past. Moreover, this network may be weighted or not, and if weighted one may use natural numbers (whole counting) or any positive real number (different variations on fractional counting). Most, but not all, of the older investigations used whole counting and did not use other network indicators besides degree centrality in an undirected network, (Frame & Carpenter, 1979; Luukkonen, Persson, & Sivertsen, 1992; Luukkonen et at., 1993; Narin, Stevens, & Whitlow, 1991; Russell, 1995; Schubert & Braun, 1990).

A recent article by Perianes-Rodriguez, Waltman, and van Eck (2016) provides a framework for—at least some—co-authorship studies. These authors made a clear difference between indicators for a phenomenon (here collaboration between countries as measured through joint publications) and indicators for the co-authorship network. Admitting that the network point of view is often a valid one (Leydesdorff & Park, 2017; Park, Yoon, & Leydesdorff, 2016) we nevertheless focus on finding an indicator for collaboration between two entities without taking the network aspect into account. Before continuing we point out that there is still another point of view in co-authorship studies. One may try to find an indicator for the global “collaborativeness” of a set of publications, starting from the simple percentage of co-authored publications in the set to the advanced Egghe-English approach (Egghe, 1991; English, 1991; Rousseau, 2011). As said above, our article focuses on an indicator for co-authorship between two entities.

_{C} = AU/N, where AU is the number of authors with all addresses in country C.

It may, however, happen that some authors have addresses in more than one country. Then p_{C} must be adapted. We proceed as follows: if AU authors with at least one address in country C contribute to this N-author publication, then the fractional contribution of country C to this publication is
_{j} is the number of addresses of author j in country C, and _{j} is the number of addresses of author j not situated in country C. It is clear that, allowing _{j} = 0, formula (1) is always valid, whether or not author j has addresses outside country C or not.

If p_{K} represents the fractional contribution of a country K in a fixed article then Σ_{K:country}_{K} = 1.

Finally, the fractional production of country C in the set under investigation is the sum of all fractional scores of all publications involving country C.

Let A be the set of all articles in database D with at least one author with an address in country C_{1} and let #A be the number of articles in A.

Let B be the set of all articles in database D with at least one author with an address in country C_{2} and let #B be the number of articles in B.

Then A∩B is the set of all articles with at least one address in country C_{1} and at least one address in country C_{2}; #(A∩B) is then the absolute value of the co-authorship indicator (co-authorship between countries C_{1} and C_{2}) using full counting. We denote this indicator as FCCI(C_{1},C_{2}).

A more interesting indicator is #(A∩B)/#A = _{C1}. This is the percentage of C_{1}–C_{2} co-authorships (in the sense of classical whole counting) among all C_{1}-publications. Yet, assuming that #(A∩B) stays constant then _{C1} can decrease because country C_{1} works more on its own, or because country C_{1} collaborates more with other countries. So, on its own this indicator cannot be interpreted correctly. More information is necessary. Similarly, and of equal interest, is #(A∩B)/#B = _{C2} the percentage of C_{1}–C_{2} co-authorships among all C_{2}-publications.

We start from the idea that the most intense co-authorship pattern between two countries in one article occurs when the two countries contribute equally (the perfectly balanced case) and no other country contributes. We give this situation a score of 1. In the purely theoretical case that two countries never publish on their own, always publish co-authored publications, and this always in a perfectly balanced way, i.e. _{C1} = _{C2}, (here and further on p-values are calculated as proposed in equation (1)) then their total co-authorship score is equal to their total number of publications. We denote the co-authorship score of two different countries C_{1} and C_{2} in one publication by cs(C_{1},C_{2}).

Definition: co-authorship score of two different countries C_{1} and C_{2} in one publication.

The formula for cs is two times the harmonic mean of the fractional contributions of countries C_{1} and C_{2}. We multiply the harmonic mean by two so that in the perfectly balanced case cs(C_{1},C_{2}) receives a score of 1 (and not 0.5). This is just a practical agreement and not essential for comparing scores over time or for different countries. If _{C1} + _{C2} = 1, then _{1}, _{2}) = 4_{C1}_{C2}.

Definition: co-authorship score, cs, of two different countries C_{1} and C_{2} for a given set of publications.

If PUB is the number of publications under consideration, then the co-authorship score cs (C_{1},C_{2}) is defined as:

We introduced the new cs-score without proper arguments. Next we show a list of good properties of cs. This list serves as argumentation for the introduction of this new cs-score. Indeed, we think that the properties shown in this list are necessary, or at least highly desirable for a proper cs-score. We first note that if a C_{1}–C_{2} co-authored publication is added to the set under consideration then _{1},_{2}), equation (3), increases. So, besides an increase for the full counting score, also the fractional counting score increases.

The cs-value as defined in equation (2), i.e. for one publication, has the following properties.

P1) For all countries C_{1}, C_{2} and for all publications: cs(C_{1},C_{2}) = cs(C_{2},C_{1}); cs is a symmetric measure. This symmetry property also holds for equation (3).

P2) For all countries C_{1} and C_{2} and for all publications: 0 ≤ cs(C_{1},C_{2}) ≤ 1; cs is a non-negative measure, with upper bound equal to one.

P3) The co-authorship score cs(C_{1},C_{2}) = 1 if and only if _{C1} = _{C2} = 0.5; this is the upper bound for the co-authorship score of two countries in one publication. This upper bound corresponds to the perfectly balanced case.

P4) If for countries C_{1} and C_{2}: _{C1} ≤ _{C2}, then for all countries C_{3}: cs(C_{1},C_{3}) ≤ cs(C_{2},C_{3}), which is a monotonicity property.

P5) The cs-value of countries C_{1} and C_{2} in one publication does not depend on how many (one, two or more) other countries and which other countries contribute to a joint publication. This is a form of anonymity property. Below we provide some comments on this property.

P6) Continuity

We can write cs(C_{1},C_{2}) as
_{1} or country C_{2}) by p_{O}, (O for “other”) we have that always _{C1} + _{C2} + _{O} = 1. Hence, we can rewrite cs(C_{1},C_{2}) as
_{1},C_{2}) can be written as a function of two independent variables, either _{C1} and _{C2} or _{C1} and _{O} (and of course also as a function of _{C2} and _{O}) where these two variables are always between zero and one and their sum is smaller than or equal to one. Considering these variables as real variables, we see that cs is a continuous function of two variables. Roughly speaking this means that a small change in any of these variables leads to a small change in cs(C_{1},C_{2}).

P7) Next we show that the more uneven the contribution of the two countries, the smaller their cs-value. Conversely, when the two countries have an equal contribution, then their cs-value reaches a maximum, depending on p_{O}.

Assume that p_{O} (< 1) is fixed, then cs(C_{1},C_{2}) increases for
_{C1} with top in the point with coordinates
_{O} = 0, this top is situated in the point (0.5, 1).

P8) In this last property we show that when the relative contributions of the two countries are given, then their cs-value depends on p_{O}: the larger p_{O} the smaller their cs-value.

In the previous property we kept p_{O} ≠ 0 fixed. Now we keep
_{O}, the smaller cs(C_{1},C_{2}), which is again a property one may expect to hold for a proper co-authorship measure for two countries.

Proof:
_{C2} = _{C2} and hence
_{C1} + _{C2} +_{O} = 1 we have here: (1+P)* _{C2} + _{O} = 1, or
_{O}.

We note that if P = 1 (C_{1} and C_{2} have an equal contribution) cs(C_{1},C_{2}) = 1-p_{O}.

We propose the following indicator as a co-authorship intensity indicator, denoted as CI(C_{1},C_{2}):

On the one hand, two countries may co-author often (FCCI is high), but this collaboration usually involves many other countries or is highly asymmetric (most authors belong to one country). Then the CI-index is small. When, on the other hand, scientists of two countries work together it is mostly without third party and in a balanced way. Then the intensity index is—relatively—high. We write “relatively” as it is obvious that in real situations CI will rarely be close to one.

We reconsider the anonymity property P5. Of course the cs-value depends on the fractions _{C1}, _{C2} and _{O} with _{C1} + _{C2} + _{O} = 1, but we mean that the cs-value does not depend on how many other countries are involved, and certainly not on which countries these other countries are. We admit that depending on the type of investigation having this information might be useful. Yet, other types of indicators are needed for this.

Until now we have talked about countries. Yet, the formula for countries can also be used within a university, where the role of countries is played by departments or schools. Formulae (2) and (3) can even be used for scientists, especially if a scientist's contribution to an N-author publication is not necessary equal to 1/N but can be any number strictly larger than zero.

For each concrete investigation one must make a decision if two (or more) addresses in one university count for one or more. Maybe the decision can be based on the postal number, but sometimes the same building can host different administrative units. We do not go into these practical difficulties, but just mention them for completeness’ sake. In the real-world example shown further on, we use addresses as provided in the Web of Science (WoS).

These data are presented to illustrate numerical values resulting from the definition of this new indicator.

The last row of Table 1 illustrates the perfectly balanced case: the two countries contribute equally and no other country contributes, while the first row of Table 2 is an illustration of the case with a three-country collaborative publication, where the target two countries contribute equally (as are some other rows).

Examples where no other countries are involved.

C_{1} | C_{2} | cs(C_{1},C_{2}) |
---|---|---|

1/5 | 4/5 | 0.640 |

2/5 | 3/5 | 0.960 |

1/6 | 5/6 | 0.556 |

2/6 | 4/6 | 0.889 |

3/6 | 3/6 | 1.000 |

Examples where other countries are involved.

C_{1} | C_{2} | Other | cs(C_{1},C_{2}) |
---|---|---|---|

1/6 | 1/6 | 4/6 | 0.333 |

2/6 | 1/6 | 3/6 | 0.444 |

3/6 | 1/6 | 2/6 | 0.500 |

4/6 | 1/6 | 1/6 | 0.533 |

1/7 | 1/7 | 5/7 | 0.286 |

1/7 | 2/7 | 4/7 | 0.381 |

1/8 | 1/8 | 6/8 | 0.250 |

2/7 | 2/7 | 3/7 | 0.571 |

2/8 | 2/8 | 4/8 | 0.500 |

3/9 | 3/9 | 3/9 | 0.667 |

4/10 | 4/10 | 2/10 | 0.800 |

As we do not have a program yet to do the complete data gathering we chose a small real-world example, doing all data gathering and calculations by hand and with the help of an Excel file. Data were collected from the Web of Science (WoS) as available at the KU Leuven, Belgium. We included the Proceedings but not the Book citation indexes. A search was performed on August 2, 2020 for CU=(Netherland* OR Holland) and CU=Belgium, within the two subject categories (SC) Mathematics and Applied Mathematics. This search was performed for PY=2016–2017 and for PY=2018–2019. Our aim is to find the cs- and CI-values for the pair Belgium-the Netherlands over these fields.

Fractional counting was performed as in formula (1). The number of authors (N) determines the basic fractions (each author receives a fraction 1/N for their country). If an author has several addresses then this fraction is further subdivided according to the number of addresses. If, for example, an author has five addresses, two in the Netherlands, one in Belgium, one in the United States and one in Germany, then their contribution to the Netherlands is 2/(5N), to Belgium it is 1/(5N) and to other countries it is 2/(5N). We used addresses as given in the WoS even if it was clear that they referred to the same physical space: clearly, the author played different roles (for different organizations) and it was considered important to mention this in the byline.

Results are given below, but numbers have only a limited importance. This example is just a small feasibility study. It provided us some experience with the practical difficulties in calculating cs-values. We note that a few articles were classified as Mathematics and also as Mathematics Applied: these are included twice. Moreover some articles published in 2020, but with an EA (Early Access Date) in 2019 were retrieved by PY=2019. This is according to the new way how PY= works in the WoS. These articles too are included.

In these fields, Belgium produced slightly more than the Netherlands. Joint work is increasing and is higher in applied mathematics than in (pure) mathematics. Yet, in all cases co-authorship between the two countries is low.

Basic data: number of retrieved publications over the period 2016–2019 (full counting).

Year | Belgium | The Netherlands | Joint work | % of joint work : Belgium | % of joint work : Netherlands |
---|---|---|---|---|---|

Mathematics: 2016–2017 | 606 | 489 | 12 | 1.98 | 2.45 |

Mathematics: 2018–2019 | 588 | 558 | 14 | 2.38 | 2.51 |

Applied Mathematics: 2016–2017 | 632 | 586 | 21 | 3.32 | 3.58 |

Applied Mathematics: 2018–2019 | 615 | 631 | 24 | 3.90 | 3.80 |

Total fractional contributions and cs-values.

Year | Average number of authors | Belgium | The Netherlands | Other countries | Total | Cs-values | Average cs-value per publication |
---|---|---|---|---|---|---|---|

Mathematics 2016–2017 | 2.58 | 4.292 | 4.708 | 3.000 | 12 | 8.882 | 0.740 |

Mathematics: 2018–2019 | 3.07 | 4.300 | 5.133 | 4.567 | 14 | 8.733 | 0.624 |

Applied mathematics: 2016–2017 | 4.52 | 9.514 | 6.394 | 5.092 | 21 | 13.962 | 0.665 |

Applied Mathematics: 2018–2019 | 4.58 | 7.400 | 8.200 | 8.401 | 24 | 14.146 | 0.589 |

The average number of authors per publication is—not surprisingly—higher in applied mathematics than in mathematics, and seems to increase (based on two periods only). Relative contribution by other countries seems to be on the rise too. Mathematics provides an example where the absolute number of co-authored articles increases, but the cs-value decreased. Yet, the general rule is that the more joint publications, the higher the co-authorship score. For the two fields the average cs-value per publication decreased over the two periods. Quite a lot of researchers have several addresses, often in different countries. As mathematics is a field where generally co-authorship is low, this shows already that data collection for fractional scores is not a sinecure.

Table 5 shows the calculation and resulting values for the co-authorship intensity of the two countries, in two fields and over two periods. For mathematics as well as for applied mathematics the CI-values decrease. As we have no experience with this new indicator we cannot discuss the meaning of the obtained values, yet the decreasing trend might not be surprising. Our indicators consist of two parts: a part directly depending on the bilateral relation between the two countries under study and a part related to the contribution of other countries. As international collaboration has increased over the years—and certainly within the European Union—a decrease of cs- and CI-values is to be expected.

Co-authorship intensity values: CI (Belgium, the Netherlands).

Year | Mathematics: 2016–2017 | Mathematics: 2018–2019 | Applied mathematics: 2016–2017 | Applied mathematics: 2018–2019 |
---|---|---|---|---|

CI calculations | 8.882/12 | 8.733/14 | 13.962/21 | 14.146/24 |

CI-value | 0.74 | 0.62 | 0.66 | 0.59 |

The introduction of these co-authorship indicators is just a start. It goes without saying that a next step must be a large scale application so that experience with the practical meaning of cs- and CI-values can be obtained. For instance, studying the Spearman correlation between rankings of a given country based on cs-values and of FCCI-values may be a first step. Finding out which of the collaborating countries change rankings over time and why might be a more interesting application.

Yet, much more can and should be done. We only studied bilateral relations. In this, the basic point of departure is that the most intense collaboration, as reflected though co-authorship, is the perfectly balanced case in which only the two countries under investigation collaborate and this in an equal way. Moreover, our approach is insensitive with respect to other countries. This immediately leads to the problem of finding indicators—within a fractional approach—of multilateral relations.

A reviewer rightly pointed out that, moreover, one should make a distinction between the measurement and corresponding indicators, of occurrence and the measurement of contribution. Reviewers further pointed out that country size may influence our indicators, suggesting size normalization as a possible solution.

Another reviewer pointed out that, in our example, a contribution of an author with addresses in different countries is counted as a collaboration between these countries. This observation is not directly related to the definition of our new indicators, but nevertheless in each concrete case a decision must be made if such a case counts as a collaboration or not. We are in favor of counting even a single-authored paper by an author with multiple addresses as an international collaboration. Indeed, as institutes in different countries hired this scientist a clear international link is present. Yet, this is only an opinion and an investigation if including such cases or not makes a difference, would be of interest. The result probably would depend on the discipline.

Finally, we note that, in this article, we did not try to gauge citation scores derived from co-authorship. This may add another layer of complexity (Smolinsky & Lercher, 2020).

By introducing two new indicators we add a new layer to the study of co-authorship and hence of collaboration. We note that these indicators for the co-authorship of two entities such as countries, do not take the network aspect into account (at least not in any direct way). These indicators use fractions, but no attempt is made to integrate so-called modified fractional counting (Sivertsen, Rousseau, & Zhang, 2019) into this approach. At the moment, we think that it is not obvious how to do this in a way that is clearly meaningful.

We proposed a co-authorship measure in the context of _{C1} + _{C2} + _{O} = 1. Yet, one may imagine a generalization in which positive parameters α, β, γ are used, leading to _{C1} + _{C2} + _{O} = 1. We think though that such a generalization would make the basic theory needlessly complicated. Yet, such parameters may play a role in normalization attempts.

Finally, we pointed out a number of research problems which we intend to tackle in the near future.

#### Basic data: number of retrieved publications over the period 2016–2019 (full counting).

Year | Belgium | The Netherlands | Joint work | % of joint work : Belgium | % of joint work : Netherlands |
---|---|---|---|---|---|

Mathematics: 2016–2017 | 606 | 489 | 12 | 1.98 | 2.45 |

Mathematics: 2018–2019 | 588 | 558 | 14 | 2.38 | 2.51 |

Applied Mathematics: 2016–2017 | 632 | 586 | 21 | 3.32 | 3.58 |

Applied Mathematics: 2018–2019 | 615 | 631 | 24 | 3.90 | 3.80 |

#### Total fractional contributions and cs-values.

Year | Average number of authors | Belgium | The Netherlands | Other countries | Total | Cs-values | Average cs-value per publication |
---|---|---|---|---|---|---|---|

Mathematics 2016–2017 | 2.58 | 4.292 | 4.708 | 3.000 | 12 | 8.882 | 0.740 |

Mathematics: 2018–2019 | 3.07 | 4.300 | 5.133 | 4.567 | 14 | 8.733 | 0.624 |

Applied mathematics: 2016–2017 | 4.52 | 9.514 | 6.394 | 5.092 | 21 | 13.962 | 0.665 |

Applied Mathematics: 2018–2019 | 4.58 | 7.400 | 8.200 | 8.401 | 24 | 14.146 | 0.589 |

#### Examples where other countries are involved.

C_{1} | C_{2} | Other | cs(C_{1},C_{2}) |
---|---|---|---|

1/6 | 1/6 | 4/6 | 0.333 |

2/6 | 1/6 | 3/6 | 0.444 |

3/6 | 1/6 | 2/6 | 0.500 |

4/6 | 1/6 | 1/6 | 0.533 |

1/7 | 1/7 | 5/7 | 0.286 |

1/7 | 2/7 | 4/7 | 0.381 |

1/8 | 1/8 | 6/8 | 0.250 |

2/7 | 2/7 | 3/7 | 0.571 |

2/8 | 2/8 | 4/8 | 0.500 |

3/9 | 3/9 | 3/9 | 0.667 |

4/10 | 4/10 | 2/10 | 0.800 |

#### Co-authorship intensity values: CI (Belgium, the Netherlands).

Year | Mathematics: 2016–2017 | Mathematics: 2018–2019 | Applied mathematics: 2016–2017 | Applied mathematics: 2018–2019 |
---|---|---|---|---|

CI calculations | 8.882/12 | 8.733/14 | 13.962/21 | 14.146/24 |

CI-value | 0.74 | 0.62 | 0.66 | 0.59 |

#### Examples where no other countries are involved.

C_{1} | C_{2} | cs(C_{1},C_{2}) |
---|---|---|

1/5 | 4/5 | 0.640 |

2/5 | 3/5 | 0.960 |

1/6 | 5/6 | 0.556 |

2/6 | 4/6 | 0.889 |

3/6 | 3/6 | 1.000 |