This paper contains the results that are part of a more general study (Mryglod et al., 2021), the purpose of which is to perform a large-scale quantitative analysis of the Ukrainian Economics discipline using the publication data predominantly beyond Web of Science and Scopus databases, where currently more than 150 Ukrainian journals are indexed while the National List of recognized scientific journals
The quantitative analysis at the level of publications based on the Crossref data is rather straightforward. For example, an estimation of the number of authors per paper can be done even without sophisticated data pre-processing. However, the consideration of individual publication histories at the level of authors or gender analysis requires name disambiguation. Obviously, this task is a big challenge, especially when dealing with non-Western names (Gomide et al., 2017; Kim et al., 2021; Treeratpituk & Giles, 2012). Since the majority of publications in the Ukrainian Economic discipline are related to local authors (Mryglod et al., 2021) it is natural to find mainly Ukrainian first and last names in our data set. Although such peculiarities as the use of a middle name or a prespecified order of parts in composite names (Gomide et al., 2017; Treeratpituk & Giles, 2012) are not typical for Ukraine, a huge problem of transliteration (e.g., see also Müller et al., 2017) still exists. But there is another side of the coin: the so-called ethnicity can be used to improve gender disambiguation. Therefore, along with the initial motivation to contribute to the quantitative description of Ukrainian Economics discipline, a special emphasis is made on the methods of processing such specific bibliographic data.
The research questions here are related both to the methods of data pre-processing and the results of the analysis of these data:
RQ1: What peculiarities of Ukrainian authors’ names have to be taken into account during the process of name and gender disambiguation? RQ2: What gender proportion is typical for the Ukrainian Economics discipline, and how can it be compared with similar results for other data sets (countries)? RQ3: What level of alphabetization characterizes the Ukrainian Economics discipline, and is it possible to identify any gender-related distinctions?
Answering the main research questions, this paper serves also as another evidence of the usefulness of Crossref data as a potential source for bibliometric analysis. Economic publications are considered in many other studies, where the data from Web of Science or Scopus databases are exploited, see (Schläpfer, 2010; Truc et al., 2021; Vaio & Weisdorf, 2009; Wei, 2018; Zhao et al., 2016). And this is reasonable in order to assess top-impact output and reveal the research front in Economics. However, if the rest of the entire picture is needed, the potential of other sources such as Crossref can be efficiently used. In this context, Ukraine has an advantage – a special interface called Open Ukrainian Citation Index (OUCI) was developed a few years ago. It provides a possibility to extract structured Crossref metadata related to all journals published in Ukraine (Cheberkus & Nazarovets, 2019). Moreover, all these journals are labeled by subject category according to the Ukrainian national classification scheme. While this data source is not as comprehensive as a national current research information system could be, it provides a unique opportunity to supplement knowledge about the research output of Ukraine.
The paper is organized as follows: the description of our data set is provided in Section 2; the applied name disambiguation procedure is described in Section 3; Section 4 describes the peculiarities of the gender disambiguation procedure for our data and contains the results of the gender analysis. The ordering of author's name for the Ukrainian Economics discipline is studied in Section 5; the final discussion can be found in the last Section.
This paper is a continuation of the authors’ previous work (Mryglod et al., 2021), which analyzes Crossref data for publications in Ukrainian journal papers in the field of Economics. The same principle of collecting data is used here, i.e., Crossref publication records related to Ukrainian Economics journals. A not-for-profit membership organization Crossref
See the Order of the Ministry of Education and Science of Ukraine N32 (2018, January 15).
The topical relevance of each journal is defined using the Speciality search filter. The following specialties are considered to be related to the Economics field (a similar subject classification is used by the State Attestation Commission of Ukraine
Official web-page of the State Attestation Commission of Ukraine:
In addition to the data available from Scopus and Web of Science, Crossref provides an important piece of the puzzle required to build the full picture of Ukrainian Economics research. The results presented in this paper are based on the updated data set: data collection is performed at the end of February 2021. Altogether, 25,933 records for papers published in Ukrainian Economics journals between 2002 and 2020 were collected (the annual publication statistics is low before 2012 and rapidly increases afterward: 97% of records correspond to the period 2013–2020). The imported records contain the following fields: Publication year; Journal ISSN; DOI; Publisher; Title; Authors’ names; Number of DOI-to-DOI citations (if the information is provided by Crossref depositors); Journal is indexed in Scopus Yes/No; Journal is indexed in Web of Sciences Yes/No (up-to-date information in the last two fields is added by OUCI).
While data analysis at the level of papers is performed (Mryglod et al., 2021), many interesting questions can be put at the level of authors. To give an example, typical individual productivity or authors’ collaboration patterns have to be known to set benchmarks for comparing, assessing, or detecting examples of unusual publishing behavior. What is also important, the authors’ gender is typically (and in this work) inferred from the given names. Therefore, the gender label cannot be assigned if only initials are specified instead of the full name. However, merging various records related to the same person, allows us to enlarge the statistics of papers with genderized authors. For example, gender can be defined for
Due to the numerous nuances listed above, it is too difficult to perform a full disambiguation procedure automatically. That can be done only partially and only in a semi-automatic way. The following criteria and approaches are used:
Identical names found in different papers are considered as related to one person since the limited data set that corresponds to a particular subject area is studied. The assumption that there is a low probability of duplicated names within our data set is confirmed by manually checking randomly selected records. Of course, exceptions are possible. Authors’ records with identical first and second names are separated if both appear in the same paper. The existence of common co-authors for two authors is considered as an argument to merge corresponding records. A manually created list of Ukrainian given names together with “synonymical” forms (Latin and Cyrillic) was used to find candidates for merging
The list is available online, The records are merged if no contradictions appear. To give an example, all names from the following list: It seems like the same name appears twice, since Cyrillic “A” and Latin “A” are used for the author's initial.
The list of authors’ names was processed using the own Python code to find the list of candidates for merging and to mark them as more or less probable. The final merging was manually confirmed using the results of this preliminary automatic procedure. Additional manual checks were performed for particular cases, where candidates are considered as important players due to a large number of publications or co-authors. Merging was not performed for the pairs where ambiguity remains, but even so, the initial set of 31.5 thousand authors’ records was reduced to 23,094.
As mentioned before, the gender label of an author is inferred from his/her given name. Since the majority of authors are from Ukraine (Mryglod et al., 2021), Slavic first names are predominantly found in our data set. Besides the list of Slavic names manually labeled by gender, free web resource Genderize
Genderize.io | Determine the gender of a first name. Only results for the names occurring at least 10 times with a probability of more than 0.9 were taken into account.
Some typical endings of Slavic surnames can be considered as gender-specific. Author records genderized on the previous step were used as a validation subset in order to check whether surnames’ endings are distinctive enough for our data set. The gender of 19.85% females was repeatedly recognized using gender-specific endings of last names:
Thus, gender labels were inferred from surnames for an additional 1,260 female and 828 male authors. Altogether, we continue with 63.7% of genderized authors’ names. And having in mind that the number of male authors is slightly underestimated, one can state that 1.5 times more female authors are found.
Our finding is in line with the statement in (Larivière et al., 2013), where Ukraine is mentioned among other “countries with lower scientific output” that are characterized by more prevalent female authorship. But our research is not cross-disciplinary, it is initially related to the Economics area. Moreover, while Web of Science data was used in (Larivière et al., 2013), we exploit Crossref as a data source in our work. The remaining question is how different can be results obtained for top journals indexed in authoritative databases such as Scopus or Web of Science and for data beyond these sources.
It is shown that gender disparities are disciplinary-dependent (Nicola & D’Agostino, 2021. The Economics discipline is considered rather as a male-dominated one
Gender in the global research landscape. The gender is undetectable for the rest authors. Of course, one has to remember that different datasets are used in these different case studies.
Another interesting relevant issue is the analysis of gender mixing and the patterns for forming authorship teams. The annual change of shares of papers classified according to the gender of authors for our data is shown in Fig. 1. Five categories are defined here for 17,352 out of 25,933 papers (the rest cannot be classified due to the lack of gender information about authors): papers authored by a single person are labeled as
Figure 1

As Fig. 1 shows, many papers correspond to the decreasing but still largest category of publications by a single female author. On the one hand, this is in line with the conclusion in (Boschini & Sjögren, 2007), where over-representation of single female authors is reported. On the other hand, it was already mentioned that the number of papers by females is expected to be larger in principle, simply due to the larger number of female authors. Indeed, for the same data set, if authors’ gender labels “F”, “M” and “undefined” are randomly reshuffled first, the share of papers in the
Distribution of papers according to authors’ gender: a single female author (
Entire data set | Reshuffled* | Not indexed in Scopus and/or WoS | Indexed in Scopus and/or WoS | ||||
---|---|---|---|---|---|---|---|
# records | Share | Share | # records | Share | # records | Share | |
All papers | 25933 | - | - | 22683 | - | 3250 | - |
All papers assigned to one of the categories | 17352 | 100% | (13785 records on average) | 15422 | 100% | 1930 | 100% |
F solo | 5871 | 33.8% | 34.8% | 5684 | 36.9% | 187 | 9.7% |
F coll | 3085 | 17.8% | 10.4% | 2841 | 18.4% | 244 | 12.6% |
MIX coll | 4283 | 24.7% | 28.5% | 3236 | 21% | 1047 | 54.3% |
M coll | 816 | 4.7% | 4.1% | 598 | 3.9% | 218 | 11.3% |
M solo | 3297 | 19% | 22.2% | 3063 | 19.8% | 234 | 12.1% |
Averages for 10 versions of the original set of publications labeled by categories after random reshuffling of authors’ gender labels are provided.
The homophily of co-authorship groups with respect to authors’ gender in the Economics field was discussed in (Boschini & Sjögren, 2007). At first glance, our results deny this conclusion. The cross-gender category is the second-largest one among the gender-labeled papers in our data, see Fig. 1 and Table 1. However, it is easy to show that this share is smaller than expected. And indeed, solo-gender collaborative papers by female authors are over-represented in the real data set in comparison to reshuffled data. It is interesting to see that the shares of cross-gender papers and collaborative papers by male authors are remarkably higher for papers that are indexed in Scopus and/or WoS, see Table 1. To some extent, this is in agreement with the conclusion about the tendency to comparatively lower gender homophily in higher-impact journals (Holman & Morandin, 2019). Such gender mixing can be seen in a very positive way, since gender is one of the most important dimensions of team diversity, which, in its turn, is often considered as a powerful catalyst for creativity, see, e.g., (Farhoomand & Drury, 2001; Liao, 2010; Reynolds & Lewis, 2017) and references therein.
Another aspect of forming collaboration teams – the ordering of authors – can be studied using our data. The position of the author's name in a list, which is ordered neither randomly nor alphabetically, can be considered as a basis for credit allocation. In this case, it is also reasonable to investigate the correlation between authors’ gender and their roles (i.e., positions in co-authorship lists). For example, the statistics of female first-authored journal articles are studied in (Thelwall & Mas-Bleda, 2020).
Economics is often considered as one of the fields where alphabetization is common (see, e.g., Frandsen & Nicolaisen, 2010; Kuld & O’Hagan, 2017; Levitt & Thelwall, 2013; Waltman, 2012), although the alphabetization rate in the economy has declined somewhat over the past decade (Wohlrabe & Bornmann, 2022). Moreover, top Economic journals are characterized by the share of alphabetized articles that is even higher compared to other Economic journals (70% vs. 60%, correspondingly) (Kuld & O’Hagan, 2017; Levitt & Thelwall, 2013). Such a way of ordering can be interpreted as a declaration of equal authors’ contribution. At the same time, the first position can still be perceived as special by external assessors. The so-called “alphabetical discrimination” is discussed in (Einav & Yariv, 2006; Kuld & O’Hagan, 2017): The staff members of a top U.S. economic department whose surnames start with letters from the first part of the alphabet are found to be more tenured. In some sense, this can be seen as a consequence of the Thomas theorem (Bornmann & Marx, 2020): even a groundless consideration of the first author as the principal one can cause further advantage in an academic career. Therefore, the following two questions are addressed further: (i) Is there any gender preference for the first position in a co-authorship list? and (ii) Can we state that the alphabetical ordering of authors is typical for the Ukrainian Economics discipline, in general?
The more authors in the collaboration list, the less the probability of accidental alphabetical authorship is (see, e.g., Kuld & O’Hagan, 2017; Waltman, 2012). Since small co-authorship lists are dominant in the Economics discipline, in many cases one cannot be sure whether the names are sorted alphabetically intentionally or unintentionally. But non-alphabetical sorting is an unambiguous indicator of other priority scheme usage. Therefore, we count the share of papers with authors’ names ordered neither by Latin nor Cyrillic alphabets. Corresponding numbers for different publication samples are provided in Table 2. For example, it can be seen that in almost half of all collaborative articles, authors are not sorted by the last names. Moreover, knowing the exact numbers of authors in the rest of the papers, one can suggest that another 21.3% of the papers are sorted in alphabetic order accidentally
Considering all collaborative papers in the initial data set, one can find that the total number of alphabetically sorted papers is 6,641. The probability for a paper to be unintentionally ordered alphabetically depends on the number of authors
The numbers of non-alphabetically ordered papers and estimated volume of potentially alphabetized papers (adjusted values) for different data samples.
Number of papers (collaborative only) | Number (%) of papers with definitely non-alphabetical ordering of authors’ names | Adjusted number (%) of papers that are potentially alphabetically ordered by intention | |
---|---|---|---|
Entire data sets | 13237 | 6596 (49.8%) | 3824 (28.9%) |
Papers marked by gender categories | 8184 | 4057 (49.6%) | 2384 (29.1%) |
Papers indexed in Scopus and/or WoS | 2592 | 1672 (64.5%) | 601 (23.2%) |
Papers NOT indexed in Scopus and/or WoS | 10645 | 4924 (46.3%) | 3223 (30.3%) |
Cross-gender papers | 4283 | 2369 (55.3%) | 1169 (27.3%) |
Solo-gender papers | 3901 | 1688 (43.3%) | 1215 (31.1%) |
The analysis of Ukrainian journals within the Economics discipline, started in the previous work (Mryglod et al., 2021), is continued in this study. Revealing the typical features of this particular segment of scholarly literature is important for solving many practical issues related to the development of assessment procedures at the national level. However, another goal of this work is to reinforce the call for complete and qualitative metadata. Crossref database is used here to describe one of SSH disciplines for poorly studied European countries. Publication metadata related to Ukrainian Economic journals is collected from the Crossref database. An attempt was made to conduct an analysis with an emphasis on gender effects at the level of individual authors. However, the procedure for disambiguating authors’ names can be done only partially. A number of peculiarities of processing author names related to the usage of Cyrillic and local traditions of parallel usage of different forms of names and even surnames are highlighted. Since the gender of an author is inferred from the full first name, even partial merging of authors’ records allows one to increase the statistics of publications with authors labeled by gender. Moreover, a manually created list of gender-specific endings for Slavic last names was used to enlarge the number of genderized authors. Altogether, 63.7% of 23,094 author records were labeled by gender, and the number of female authors is found to be 1.5 times larger than male authors. This result contradicts the statements about the masculinism nature of Economics research. Alternatively, female dominance in Ukrainian Economics research may be considered as a hint about its specific thematic spectrum. According to (Thelwall et al., 2019), keywords related to qualitative and exploratory methods are statistically associated with female scholar authors, while other keywords related to quantitative methods are more related to male authors. In some sense, such sensitivity of gender representation to the topic selection is in line with the conclusion in (West et al., 2013): considerable differences in this context were observed for different Economics subfields. This reinforces our previous conclusions about the specific patterns of collaborativeness in Ukrainian Economics research. Still, an important caution exists: this study is one of the rare examples where data beyond internationally recognized databases is used. Therefore, the guess about the different nature of locally-oriented and internationally-oriented topics chosen for Economics research remains relevant.
Gender mixing is analyzed to find the evidence that gender plays a role when forming collaboration teams. All papers labeled according to five gender-related categories (solo-publications by males; solo-publications by females; solo-gender collaborations of males; solo-gender collaborations of females; and cross-gender collaboration) are considered. While only one third
One fourth if authors with single papers are considered as well.
It is shown that the level of alphabetization of authors’ names in Ukrainian Economic papers is comparatively low. This is especially true for articles indexed in Scopus or Web of Science databases.
Interestingly, different results of gender mixing are found for papers published in journals indexed in Scopus or Web of Science, compared to the rest of publications. Remarkably, while the largest share of papers solo-authored by female authors is expected due to the greater general number of female authors, this category of papers indexed in the international databases is the least represented one. Most papers in internationally recognized journals are characterized by cross-gender collaboration. This can be seen as the manifestation of the so-called reactivity of the Ukrainian Economics discipline (see Aistleitner et al., 2019; Sasvári et al., 2019). One can speculate about the adaptive publishing behavior: a different publishing or even research strategy is chosen depending on the level of recognition and audience of the target journal. The similar conclusion can be drawn for different shares of papers where authors are listed alphabetically. Interestingly, while a high level of alphabetization is found for the Economics discipline in general and even higher for Economic publications in internationally recognized journals, the opposite pattern is observed in Ukrainian Economics research.
To conclude, the results of another case study are presented. Besides the findings specifically related to Ukrainian research, some key aspects related to the processing of non-English metadata are highlighted. It is worth emphasizing once more that many complications become irrelevant if unique digital identifiers are commonly used.
Figure 1

Distribution of papers according to authors’ gender: a single female author (F solo); two or more authors, only females (F coll); two or more authors, at least one female and at least one male (MIX coll); two or more authors, only males (M coll); a single male author (M solo).
Entire data set | Reshuffled |
Not indexed in Scopus and/or WoS | Indexed in Scopus and/or WoS | ||||
---|---|---|---|---|---|---|---|
# records | Share | Share | # records | Share | # records | Share | |
All papers | 25933 | - | - | 22683 | - | 3250 | - |
All papers assigned to one of the categories | 17352 | 100% | (13785 records on average) | 15422 | 100% | 1930 | 100% |
F solo | 5871 | 33.8% | 34.8% | 5684 | 36.9% | 187 | 9.7% |
F coll | 3085 | 17.8% | 10.4% | 2841 | 18.4% | 244 | 12.6% |
MIX coll | 4283 | 24.7% | 28.5% | 3236 | 21% | 1047 | 54.3% |
M coll | 816 | 4.7% | 4.1% | 598 | 3.9% | 218 | 11.3% |
M solo | 3297 | 19% | 22.2% | 3063 | 19.8% | 234 | 12.1% |
The numbers of non-alphabetically ordered papers and estimated volume of potentially alphabetized papers (adjusted values) for different data samples.
Number of papers (collaborative only) | Number (%) of papers with definitely non-alphabetical ordering of authors’ names | Adjusted number (%) of papers that are potentially alphabetically ordered by intention | |
---|---|---|---|
Entire data sets | 13237 | 6596 (49.8%) | 3824 (28.9%) |
Papers marked by gender categories | 8184 | 4057 (49.6%) | 2384 (29.1%) |
Papers indexed in Scopus and/or WoS | 2592 | 1672 (64.5%) | 601 (23.2%) |
Papers NOT indexed in Scopus and/or WoS | 10645 | 4924 (46.3%) | 3223 (30.3%) |
Cross-gender papers | 4283 | 2369 (55.3%) | 1169 (27.3%) |
Solo-gender papers | 3901 | 1688 (43.3%) | 1215 (31.1%) |