Catégorie d'article: Research Papers
Publié en ligne: 19 nov. 2024
Pages: 1 - 23
Reçu: 25 juil. 2023
Accepté: 22 nov. 2023
DOI: https://doi.org/10.2478/jdis-2024-0026
Mots clés
© 2024 Yurui Huang et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
In today’s world of scientific exploration and technological progress, the significance of collaboration has grown alongside the rising complexity of scientific challenges (Laudel, 2001; Sonnenwald, 2007). This increased complexity requires collective efforts, making collaboration more essential than ever. Collaborative initiatives enhance research by bringing in a variety of perspectives from different disciplines, which are crucial for effectively tackling complex challenges. This idea is backed by influential studies that illustrate how collaborations draw upon diverse expertise and viewpoints, improving the capacity to understand and solve intricate scientific problems (Huang et al., 2023; Klein, 2005; Qin et al., 1997).
The mechanisms of scientific collaboration often manifest through co-authorships in articles and patents (Barber & Scherngell, 2013; Huang et al., 2024; Newman, 2001; 2004). In separate studies across various fields, including biomedical research, physics, and mathematics, Newman identified that most scientists have a limited number of collaborators, while a few have hundreds or even thousands (Newman, 2004). Understanding collaboration patterns is crucial to comprehending the evolution of scientific progress. Guimerà et al. (2005) studied factors like team size and involvement of newcomers and found that team composition significantly influences both collaboration structures and outcomes in both the arts and sciences. Similarly, Tomassini and Luthi observed the dynamics of collaborative networks in genetic programming, recognizing the changing nature of these ties (Tomassini & Luthi, 2007). It is, therefore, essential to understand the collaboration practices of scientists to better learn how networks develop and function, which in turn, can inform strategies to improve collaboration in scientific research and innovation.
Building on this understanding of collaborative patterns, the researches by Barabâsi et al. (2002) and Barber and Scherngell (2013) further reinforce the significance of co-authorship networks within the scientific community. These studies reveal that such networks are not uniformly structured but are composed of distinct substructures, underscoring their scale-free nature. This characteristic is shaped by both internal and external co-authorship links, influencing the networks’ scaling and topological evolution (Fortunato, 2010). Additionally, the identification of communities within these networks, as indicated by Van Nguyen et al. (2012), is heavily reliant on the density of links within groups compared to those between them.
The effective detection of communities within these networks is a pivotal aspect of network analysis, as highlighted by Reichardt and Bornholdt (2006), Fortunato and Hric (2016). This process, which unravels the few clusters in intricate networks, offers invaluable insights into the organizational and functional structures therein. The journey of methodologies for community detection spans from foundational graph-theoretical approaches to the use of advanced machine learning and has been a key factor in identifying communities within various networks (Ng et al., 2001; Zhang et al., 2009; Zhao et al., 2005). The field has been greatly advanced by the application of deep learning, probabilistic modeling, and multi-resolution strategies for the purposes of discovering communities within expansive and varied networks (Liu et al., 2020; Rosvall et al., 2009; Zhang et al., 2009). Zhang et al.(2023) applied the attributed network clustering algorithm (Falih et al., 2018) to understand the collaborations among statisticians better, and Mao et al. (2017) leveraged machine learning and network theory to detect topical scientific communities. Such innovative approaches not only support the analysis of collaboration patterns but also encourage strategic research and offer a window into the collaborative evolution of scientific knowledge.
Recognizing the unique role of mathematics as a foundational language across scientific areas and its importance in fostering research collaborations, our study specifically investigates the collaboration network of mathematicians (Asif & Islam, 2016; Grossman, 2002). With this in mind, we focus on scientific collaboration and career development within the mathematical circle and wonder what the underlying community structure in the collaboration network of mathematicians is and what roles elite ones play within their communities compared with their peers. These research questions aim to identify and analyze communities within the collaboration network, shedding light on the collaborative characteristics of elite mathematicians and their peers in advancing mathematical research. To achieve this, the study draws publication data from OpenAlex (Priem et al., 2022), an open dataset, and collects supplementary information on mathematical awards. Focusing on collaborations among mathematicians, including recipients of prestigious awards in the field of mathematics such as the Fields Medal and Lobachevsky Prize, the research aims to uncover and delineate clusters of collaborations within this network by applying two community detection algorithms: Greedy Modularity Maximization (GMM) (Clauset et al., 2004) and Infomap (Rosvall et al., 2009). Subsequently, a detailed analysis of these detected communities will be conducted, probing into their inherent characteristics, such as collaborative traits and impact within distinct mathematical sub-fields.
Investigating the network characteristics of mathematicians, particularly those who are award laureates, is a vital part of modern scientific research. This study aims to understand how these mathematicians are positioned within collaboration networks and the impact they have on these communities. By examining the roles, influence, and structural formations of these mathematicians, especially those who have received prestigious awards, we aim to shed light on the dynamics of their collaborative interactions. The insights gained from this research will enhance our understanding of collaborative networks in mathematics and provide valuable perspectives on the dynamics and evolution of scientific collaboration across various disciplines.
The basic characteristics of mathematicians’ collaborative networks.
〈 |
ln |
〈 |
density | ||
---|---|---|---|---|---|
79,016 | 342,022 | 8.657 | 11.2774 | 0.1972 | 0.0001 |
In this study, we leverage the extensive dataset from OpenAlex, a key resource for our analysis. Our study collected data on scholars who fulfilled specific requirements: they were prolific authors with more than ten publications between 1960 and 2021. We classified authors primarily active in mathematics and identified collaborative links among them. Figure 1(a) shows a significant rise in collaboration among mathematicians after 2005, reflecting the necessity to tackle complex problems requiring interdisciplinary cooperation. Additionally, we compiled information on renowned mathematicians who have received awards in their field from platforms including Wikipedia, Wikidata, and official award websites. This supplementary dataset encompasses 341 unique award categories and includes detailed information on the mathematicians’ demographics and achievements. Of these distinguished award recipients, 395 individuals were successfully matched with profiles in the OpenAlex database.

Data characteristics of mathematicians.
This data collection methodology enabled us to gain insights into the collaborative networks of 79,016 mathematicians affiliated with 8,833 institutions across 172 countries. We were also able to categorize their main areas of research within mathematics, including fields like pure mathematics, mathematical analysis, combinatorics, and statistics, as shown in Figure 1(b). Following this, we constructed an undirected weighted collaborative network, designated as
In our network

Network characteristics.
In network analysis and statistical modeling, centrality metrics are crucial for evaluating the significance or influence of nodes within a network. These metrics are instrumental in deciphering the network’s structural nuances and the dynamics of influence and information flow. This paper highlights four primary centrality measures, each offering unique insights into nodes’ roles within weighted networks: Betweenness, Closeness, Harmonic centrality, and Eigenvector centrality. For this analysis, we employed the Python interface of igraph (Csardi & Nepusz, 2006) to execute and calculate four key centrality metrics. Here,
Betweenness is delineated by the degree to which a node falls on the shortest paths interlinking other nodes. Nodes with high Betweenness are akin to connectors or intermediaries and play a crucial role in the network’s efficient communication. Mathematically, it is defined as the total number of shortest paths that traverse through a particular node and is given by:
Closeness evaluates a node’s proximity to all other nodes, advocating for the importance of nodes that can rapidly connect with the rest. Higher closeness indicates a node’s strategic position for swift information spread across the network. The closeness for a node is computed as the inverse sum of its shortest path distances to all other nodes:
Harmonic centrality modifies the concept of closeness centrality by focusing on the reciprocal of the weighted shortest paths from one node to all others rather than averaging these distances. This subtle difference shifts the emphasis toward the influence exerted by shorter paths within the network. Harmonic centrality is summed up as follows:
This measure emphasizes nodes that are not only close to other nodes but also connected to a wide range of other nodes, contributing to global network accessibility and cohesion.
Eigenvector centrality reflects a node’s prestige by recognizing that connections to influential nodes amplify a node’s score. In essence, nodes with extensive connections to other central nodes are deemed pivotal. The centrality score is iteratively derived from the neighbor’s scores, defined by the following relation:
The study of elite mathematicians within academic communities is instrumental in unraveling the complexities of knowledge sharing, collaborative patterns, and the structure of scholarly networks. This exploration, particularly through the lens of network analysis, has become a vital aspect of modern research (Asif & Islam, 2016; Gaskó et al., 2016; Izquierdo et al., 2018). Our research identifies these elite mathematicians based on their significant career achievements, namely their receipt of notable awards in mathematics. We then segment collaboration networks into distinct communities using two algorithms and focus on calculating the four aforementioned centrality metrics within these communities. This method provides a deeper insight into the dynamics of centrality and the critical role these eminent figures play in the formation and evolution of academic networks.
The outcomes, outlined in Table 2 and depicted in Figure 3, reveal a distinct trend when comparing award-winning mathematicians with their peers. Significant differences in Betweenness, Closeness, and Harmonic Centrality between these two groups were identified, showing statistically meaningful variations (

Four centrality metrics were computed within communities identified by both GMM and Infomap, and subsequently compared between awardees and other mathematicians. The metrics assessed were: (a), (e) Betweenness; (b), (f) Closeness; (c), (g) Harmonic Centrality; and (d), (h) Eigenvector Centrality.
The
Betweenness | Closeness | Harmonic centrality | Eigenvector centrality | |
---|---|---|---|---|
GMM | 47,777.2381*** | 0.0141*** | 0.0245*** | -0.0455*** |
Infomap | 96.5236*** | 0.0053 | 0.0323*** | 0.1148*** |
The analysis of eigenvector centrality for awardees shows contrasting results between communities identified by GMM (negative) and Infomap (positive). As previously discussed, although both algorithms uncover community structures, they operate on different principles and exhibit distinct strengths. GMM is straightforward but may encounter resolution limits, hindering its ability to detect very small or loosely connected communities. In contrast, Infomap is adept at identifying community structures in large, modular networks, typically resulting in more balanced community sizes. This is the consequence that in larger networks, eigenvector centrality tends to exhibit greater consistency. The complexity and sheer number of nodes in these extensive systems reduce the influence of minor connectivity changes on individual nodes’ eigenvector centrality, thereby enhancing their stability. Conversely, in smaller networks, certain nodes might display heightened eigenvector centrality due to simpler connection patterns, potentially leading to concentrated control of information flow among a few nodes. An examination of eigenvector centrality across the entire collaboration network reveals no substantial difference between awardees and their peers (-0.00127 with
Researchers frequently collaborate with familiar individuals, such as mentors, past research colleagues, or established experts in their academic domain (Singh et al., 2021; Yu et al., 2021). These collaborations form complex networks characterized by well-established connections, notably evident in scholarly co-authorship. Understanding these connections is crucial in exploring scientists’ intricate collaboration patterns and identifying cohesive clusters or communities within academic networks.
In our study of mathematicians’ collaboration network, we employed two community detection algorithms, namely GMM and Infomap. The first one identified 2,629 communities within network

Distributed characteristics of mathematicians and mathematical awardees within communities.
In examining whether awardees are evenly distributed across communities, we postulate a straightforward, linear correlation between the number of awardees and the size of each community. To test this idea, we implemented a statistical simulation for randomly allocating awards among different communities. This simulation allocated awardee numbers across communities using a multinomial distribution
Next, we ran 1,000 random simulations using the described multinomial distribution
Furthermore, we explore the complex interplay between the research orientations of mathematicians and the communities identified within their collaboration network, categorizing them into 16 distinct mathematics sub-fields (see Figure 1(b)). To evaluate the congruence between these sub-fields and the communities delineated by two different algorithms, we apply the NMI (Details mentioned in the Appendix). For the purpose of robustness verification, this section designs a random experiment where discipline labels are randomly assigned to all nodes in the network, and the NMI values between community labels and random discipline labels are calculated for different algorithms. Through 1,000 repeated experiments, the study obtains the range of NMI values between community labels and discipline labels under random experiments. Comparing the communities identified by both algorithms against a random reassignment of sub-field labels for each mathematician, Table 3 reveals a significant positive correlation between the communities and the sub-fields. This implies that the identified communities inherently reflect aspects of these sub-fields, suggesting a natural tendency for mathematicians working in similar areas to collaborate more frequently.
NMI analysis between true field labels and detected community labels.
Method | GMM | Infomap |
---|---|---|
Real NMI | 0.2222 | 0.2404 |
Random NMI | (0.03361, 0.03365) | (0.08247, 0.08250) |
These insights lead us to question whether the community information, encapsulating aspects of mathematical sub-fields, could be effectively represented using different methodologies. In particular, we explore the relationship between the number of award recipients in a community, the community’s disciplinary makeup, and its size. To analyze the structured nature of field labels within these communities, we introduce the Simpson index (Simpson, 1949; Somerfield et al., 2008). Our linear regression analysis, examining the relationship between the number of awardees, community sizes, and the Simpson index (indicative of disciplinary composition), yields compelling findings. The results displayed in Table 4 indicate that larger communities tend to have more award recipients, consistent with earlier observations in this study. Conversely, there is an inverse correlation between the Simpson index and the number of awardees. This suggests that communities with higher diversity in their field labels or greater disorderliness are less likely to include award recipients. This finding implies that mathematicians who specialize in specific subdomains and collaborate intensively within those domains are more likely to achieve notable success in their field.
Linear regression analysis on the number of awardees.
Algorithm | #awardees | |
---|---|---|
GMM | Infomap | |
Community size | 0.0056*** | 0.0092*** |
Simpson index | -0.1075* | -0.0339*** |
p<0.001,
p<0.01,
p<0.05
In our study, which centers on mathematicians and their collaborative networks, we strive to uncover the underlying community structures and collaboration patterns prevalent among scholars in the field of mathematics. A key aspect of our investigation is the distinct role played by award-winning mathematicians within these networks. Employing two robust community detection algorithms GMM and Infomap our research delineates the collaboration clusters that involve these distinguished mathematicians. Our study, thereby, illuminates not just the collaborative behaviors of elite mathematicians but also the broader structure of knowledge dissemination and its impact on the evolution of the field. By analyzing the roles and patterns of award-winning mathematicians, we gain a deeper understanding of the forces shaping mathematical research and its progression.
This study delves into the roles and impacts of elite mathematicians within academic networks, with a particular focus on their influence, which is evaluated using centrality metrics. The results show that mathematicians who are award recipients exhibit higher Betweenness, Closeness, and Harmonic centrality than their peers who haven’t received such awards. This suggests they have a more influential or connective role within networks.
Additionally, awardees are found to concentrate their collaborations within certain communities as opposed to engaging equally across the entire collaboration network. Randomized trials support these findings, indicating a tendency for awardees to cluster within particular communities. They are not only central in terms of their connections but also in their ability to bridge different clusters within the network, facilitating the flow of ideas and collaborations across disciplinary boundaries specific to mathematicians.
Furthermore, our examination of the connections between mathematical subfields and the communities detected by the two algorithms shows a significant positive correlation. This suggests that collaboration within the mathematician community loosely reflects these sub-fields, with individuals in similar areas tending to work together. This underscores the role of community structures in shaping research trajectories and the evolution of mathematical knowledge, highlighting how mathematicians in similar sub-fields collaborate to enhance coherence and specialization within their respective communities.
Lastly, the study examines the relationship between the number of awardees, the disciplinary diversity within communities, and the sizes of these communities. The analysis indicates that mathematicians are more likely to achieve success in their field when they specialize in specific subdomains and participate extensively in collaborations within those subdomains. Moreover, our findings highlight the need for interdisciplinary approaches in studying interdisciplinary scientific collaboration. By extending our analysis beyond mathematics to include interactions across various scientific disciplines, future research can provide a more comprehensive understanding of collaborative dynamics and community structures in science. This broader perspective is crucial for addressing complex global challenges that require interdisciplinary solutions.
The primary limitation of this study is its focused examination of mathematicians and their awardees. This specific concentration may restrict the applicability of our findings to the broader scientific community. By directing our attention to a select group of scholars, we may not fully capture the collaborative patterns and community structures that exist across a wider range of scientific disciplines.
Consequently, future research would benefit from adopting an interdisciplinary approach. Given the increasing focus on inter-disciplinary collaboration in contemporary research, it is imperative to investigate how collaborations span different scientific domains. This would entail broadening our scope beyond mathematics to encompass interactions between mathematicians and researchers from various other fields. Additionally, this study encounters a data limitation due to the potential incompleteness of the dataset on mathematics awardees, a consequence of the manual online data collection process. This method could introduce issues such as data omissions and challenges in achieving a comprehensive dataset. The reliance on manually collected online data may result in inaccuracies due to inconsistencies in award reporting, the frequency of information updates, and variability in data accessibility. Although we have considered edge weights in GMM and Infomap, the application of other community detection methods like the k-clique method are also important in future work to obtain more robust results and conduct horizontal comparisons. Incorporating the k-clique method could provide additional insights into the tightly-knit sub-communities within the collaboration network, further enhancing the robustness and depth of our analysis.
These limitations could compromise the depth and reliability of our findings, particularly those concerning the roles and impacts of awardees in the mathematical community’s collaborative networks. Future studies would greatly benefit from more systematic and automated data collection methods, which would likely reduce the occurrence of incomplete data and lead to more thorough and precise analyses. Moreover, the development of new community detection algorithms specifically designed for the distinctive characteristics of scientific networks could substantially improve the precision of community identification. This tailored approach would enable a more accurate understanding of the collaborative dynamics across various scientific fields.