Accesso libero

An Analysis of Relations Among European Countries Based on UEFA European Football Championship

INFORMAZIONI SU QUESTO ARTICOLO

Cita

Introduction

Football, also known as soccer in North American countries, is not merely a sport. It is a prominent global industry worth billions of dollars; eagerly followed by millions of fans world-wide. As of 2020, the total revenue of the European professional football market was estimated at 25.2 billion Euros (Lange 2021). Given the popularity of football and the monetary size of the football market, the player transfers can be deemed as serious trade transactions between different countries. The transfers of key national players can also shed light on relationships among participating countries considering how these impact outcomes of major public events such as national football games. Player transfers have become more common over the years. The increased connectivity all around the world supported by the ease of transportation and growing media/social-media resources made it easier for talented players to be recognized internationally and to play outside of their home countries.

With this context, it is easy to realize that data from international tournaments and the transfers happening around those provide us with a very valuable and systematic tool in evaluating relationships among participating countries. For example, since 1960, UEFA has organized the European Football Championship tournament among its members every four years. The Union of European Football Associations (UEFA) (www.uefa.com) is the governing body of European football and the umbrella organization for 55 national associations. The European Football Championship is known to be the second most prestigious international football tournament after the World Cup. The multi-decade data publicly available from the European Football Championship records provides an invaluable resource for analyzing the over-time trends of European football as well as the relationships among participating countries and their reflection via the lens of football.

In this paper, we construct over-time social networks containing the contestants of several UEFA European Football Championship (EFC) and analyze the relationship between European countries. Although 16 tournaments were held in total, in this paper, we focus on the last eight occasions (last 30 years) since the number of contestants in earlier championships is very limited. This is because, back in 1960, the first European Football Championship tournament was held with 4 teams. Later on, the number of teams extended to 16 qualifying teams. The qualification period starts two years prior to the contest during which all members of UEFA play with each other to be qualified as one of those 16 teams. Using clustering and centrality metric analysis, we aim to uncover patterns within UEFA country-to-country networks and answer questions such as the following:

How did the domestic player bias change over the years?

Which countries are most preferred by players? How did this change over the years?

Are there any close-knit clusters among countries in terms of player transfers?

The remainder of this paper is organized as follows. In Section 2, we review several studies that focus on network analysis in sports. In Section 3, we describe how we construct our dataset and the analysis tools and techniques that are used. Finally, in Section 4, we present our results and perform an in-depth analysis and interpretation of our findings.

Related Works

In this section, we provide an overview of previous research on sports teams that used statistical and social network analysis related techniques. Lewis (2004) is considered as one of the most influential books on sports ever written. After the enormous success achieved in baseball by using statistical analysis to improve team performance (Lewis 2004), there was a surging interest among the mathematicians and statisticians to use similar techniques on other sports. Lago-Peñas, Lago-Ballesteros, and Rey (2011) gathered several game-related statistics to analyze the differences between performance indicators of winning and losing teams. Similar to Lewis (2004), several studies used statistical techniques to develop game strategies by analyzing player substitutions Rey, Lago-Ballesteros, and Padrón-Cabo (2015) and referee bias Buraimo, Simmons, and Maciaszczyk (2012).

In baseball, the players must follow a predefined course between bases whereas all players are free to move within the limits of the game field in a soccer game. Therefore, analyzing soccer games usually requires more complex models, namely networks. Several studies model soccer games as interaction networks to predict the performance of teams at different games modeling the players as the nodes, and their interactions, such as physical distance on the game field or the number or ball passes between two players, as the edges (Buldú, Busquets, Martínez, Herrera-Diestra, Echegoyen, Galeano, Luqu 2018; Korte, Link, Groll, and Lame 2019). The analyses conducted by these studies are game-based, i.e., each game is modeled as an interaction network of players and the statistics are calculated individually for each network. Both Buldú, Busquets, Martínez, Herrera-Diestra, Echegoyen, Galeano, Luqu (2018) and Korte, Link, Groll, and Lame (2019) deep dive on metrics such as degree distribution and centrality metrics while Korte, Link, Groll, and Lame (2019) focuses on additional metrics such as flow centrality. Flow centrality measures the friction of games that it is involved in at least once relative to all games by a player's team. It can be interpreted as the proportion of plays that a player was involved in relative to all plays by its team. Hence, they observe the behavioral differences between offense and defense players. This connects back with the team success which has long been an important research area that attracts highly attention from social network analysis researchers.

Buldú, Busquets, Echegoye, and Seirul.lo (2019) utilize different network metrics to understand team success better by focusing on the F.C. Barcelona coached by Guardiola, which is considered as one of the best football teams of all time. The authors compare their findings from this team to their opponents in the Spanish leagues. In Fransen, Van Puyenbroeck, Loughead, Vanbeselaere, De Cuyper, Broek, and Boethe (2015), authors focus more on success of leadership within the football teams and find that athlete leaders had more of a motivational and social leadership role than their coaches at times. De Pina and Laire (2017) investigate synergistic interaction between teammates by analyzing 12 games of the Group Stage of UEFA Champions League 2015/2016 Group C by using public records from TV broadcasts. Their results confirmed that density, but not clustering coefficient or centralization, was a significant predictor of the success of offensive plays.

Although modeling individual soccer games is very informative to develop new and successful game strategies, these models focus only on the player performance and the strategies applied by the trainers at the time of the game. Such models ignore the long term effects of the social environment around the team such as the impact of international politics, economy, and team management that lead to transfer decisions as well as the long term relationships between different teams which may contribute to the motivations of players in specific games. As an attempt to include these additional factors, few other papers that study social networks from UEFA Europa League such as Sulak, Yılmaz, and Özkaynak (2018) and Taufik (2021) considered teams, instead of players, as nodes in social networks. Sulak, Yılmaz, and Özkayna (2018) investigate the relationships between the teams that participated in the UEFA Europa League between 2004 and 2017. Although this study uses a higher-level abstraction than game by game or player by player analysis, it was only partially successful in capturing more complex factors such as the international political and economical relationships at the country level. To that end, Taufik (2021) conducts social network analysis on 2017–2018 UEFA Champions League matches. This analysis covers many grounds, from meta path analysis to centrality measures and community detection. Despite being a detailed analysis and containing answers to some popular questions about the specified season, this study also does not discuss any international relations or longitudinal effects of those relations.

On the other hand, there is another group of studies that focus only on higher level analysis of football teams and leagues but do not use social network analysis techniques to model their data (Leontijević, Janković, and Tomić 2019; Felix, Barbosa, da F. Vieira, and Xavier 2019; Feddersen 2006; Ruta, Lorenzon, Nicolò, and Gorler 2022). Leontijević, Janković, and Tomić (2019) compare the attacking characteristics of football teams. They cluster the teams according to their UEFA coefficients and then use random sampling to choose a representative team for each cluster. They point out some differences in technical and tactical performances between the teams of different clusters by presenting statistical tests on a number of variables. Although this study considers the home countries of the teams in a limited scope, it analyzes each cluster representative, ignoring the relations between different teams and countries. One study that focuses on the international football market is Felix, Barbosa, da F. Vieira, and Xavier (2019) where they have downloaded transfer data from the Transfermarkt website and analyzed buyer and seller countries from a pure transaction perspective and this study does not focus on UEFA league or European countries in particular. Another research that considers economical aspects of football is conducted by Feddersen (2006) and it aims to uncover whether competitive balance changes when there is an imbalance in the financial situation of the clubs. The author shows that in the German “Bundesliga”, there is no significant relationship between team revenues and competitive balance. However, this study is limited to the German league, i.e. the results do not extend to the other European countries, and it ignores the cultural and social factors regarding the different home states of the teams and long-term relations among teams. The analysis done by Ruta, Lorenzon, Nicolò, and Gorler (2022) takes an interesting approach in understanding economical impact of UEFA participation and shows that money prizes have a significant and specific impact on European club competitions organized by the UEFA in terms of improving national sport performances both in the season the award is given as well as the following season.

When we consider the findings from all the studies mentioned above, we notice several research studies on player and team success and applying that in game strategy development as well as some higher level analysis at the country level. However, there is a gap in understanding the motivational factors affecting players who play in different countries in regular game seasons and national leagues. While several studies achieve to capture a subset of these factors, they focus on either (i) player-to-player or team-to-team relations and do not take higher-level interactions into consideration or (ii) they use indirect indicators of these high-level interactions, such as revenues and transfers, but fall short of providing a comprehensive model when they analyze these indicators independent from each other. In this study, we are aiming to address this gap by providing a thorough analysis of UEFA data over several decades using clustering and social network analysis techniques.

Methodology

Initially, we constructed the country level networks for the last eight occasions of the tournament (e.g. 1992–2020). To achieve consistency among these networks, the union of all countries attended at least one year is used to create a template in terms of nodes, and the edges among these nodes are changed for each graph. While it is intuitive to analyze this data as a dynamic system, due to the low-resolution Farine (2018), we prefer a static analysis approach. We use several algorithms to observe the changes in clusters over the years. In addition, we also investigate the correlation between patterns in these networks with historical and economic events.

Dataset

UEFA European Football Championship is an international football tournament held every four years (U.E. Committee 2020). Each nation submits a squad of a fixed number of players, and some of them might be playing in a team that is in a different country from their country of origin. Using the countries that attended this tournament on the last eight occasions, from 1992 to 2020, we obtained a set of graphs with a directed edge from country A to country B if at least one player from country A is playing in a club in country B. The weight of the edge denotes the total number of players playing in the country B that are citizens of the country A. Although a player might have played in different teams or even in different countries between two consecutive tournaments, for the sake of regularity and ease of data collection we limit the input to the squads of the tournaments.

We model our data set as a collection of directed and weighted networks. The direction of edges represents the direction of transfer. If a player X is originally from country A to country B, this contributes to the weight of the link A → B in the Country x Country network to increase by 1. The higher the weight is, the stronger the relationship between the two countries that edge links. For algorithms that rely on shortest path computation, we invert the edge weights to use them as distances as the two countries that are linked by stronger relationships have smaller distances between them. In our analysis, we later utilize edge weights and directions to compute centrality metrics and clustering algorithms to identify influential countries, domestic player bias and groups of countries.

Our approach also accurately accounts for the cases where the same player X had played for several countries. We model each year's tournament as an independent network on its own. Hence, there will be different edges in respective year's tournaments. However, please note that these national level championship tournaments are held every four years, and players might be playing in different countries and leagues between tournaments. Such data is out of scope of this work since our work focuses on national football games that are played as a part of the UEFA European Football Championship tournament. Moreover, for the national football games, the full list of participating players are publicly and officially announced in (Worldfootball 2020). Hence, our data set is free of issues such as missing or inaccurate data. In all games we have studied, the teams and their players are officially announced.

Domestic Player Bias (DPB)

We analyze how many of the nation's players are playing in their own country. For a graph represented by its adjacency matrix, A, where there are players, the n teams consisting of p players, the average domestic player bias (DPB) is calculated in Eqn. 1. We expect a decrease in this metric over the years due to the globalization of the football market and the increase in the number of international transfers. However, the behavior might change in different countries. Although the globalization of football has increased, some countries remain local and have most of their national players playing in their countries’ leagues. The underlying reasons might be related to the sport itself or might be a result of the economic or sociopolitical situation of the country.

DPB=i=1nAiin×p DPB = {{\sum\limits_{i = 1}^n {{A_{ii}}} } \over {n \times p}}

In Eqn. 1, the self-loops Aii indicate the number of players that participated in the league in a team from their own country. Network density is another analyzed metric since low network density means low interaction between countries, and only a few countries share players. The analysis compares the EFCs in the ’90s to the most recent EFCs.

Most Influential Countries

In global football, not all leagues compete at a similar level. Some countries have more established and developed leagues, thus are able to attract more skillful football players. These leagues serve as talent magnets, pulling the best national players from all around the world (and Europe), which develops the hosting nation's national football level. As a result, more national footballers join the clubs in these leagues.

We analyze which leagues have had this advantageous position between two consecutive tournaments. Different countries might hold this position in different championships, or similar countries might occur on each occasion. To perform a well-rounded analysis, we investigate both Eigenvector centrality which is a network-level, structural measurement and degree centrality which is a node level measurement.

Eigenvector centrality has a recursive definition and provides insights on who is more central and who is not, in the global network. The leagues that attract more skillful players are usually economically richer leagues (e.g. England, Germany, Italy), and it is harder for leagues from countries which are in the close knit groups of these countries to become influential. Hence, Eigenvector centrality provides a good, stable measure for highlighting which countries belong to more influential countries.

In addition, since we have directed and weighted networks, the importance of the countries is also expected to be highly correlated to their in-degree centrality. A country that is more of a talent magnet is able to attract more incoming foreign players, which is indicated by a higher in-degree centrality. As demonstrated earlier in Taufik (2021), we infer the most influential countries in football using centrality metrics.

Clusters

As with most networks, our networks have some structural properties. Clustering between countries is one of the most prominent features observed among different tournaments. Except for some of the oldest tournaments we used, which have higher domestic bias, there are many countries whose players play in each other. As a result, a clique-like structure is formed, where the connected countries might also be related to each other socio-politically. Although most of the participating countries are part of European Union or European culture, there still are smaller clusters containing neighboring countries or countries that are geographically and culturally closer to each other; such as England-Wales, Austria-Switzerland or Scandinavian countries. We use different clustering algorithms to analyze this structural property, such as Girvan-Newman (Girvan and Newman 2002), DBSCAN (Ester, Kriegel, Sander, and Xu 1996), and modularity-based Louvain algorithm (Blondel, Guillaume, Lambiotte, and Lefebvre 2008).

Girvan-Newman algorithm (Girvan and Newman 2002), is a divisive clustering algorithm that focuses on edges most likely between communities. The algorithm uses edge betweenness which is the number of shortest paths that go through an edge in the graph. A high edge betweenness score refers to a bridge-like connector between two network parts. Girvan-Nevman removes the edge with the highest edge betweenness centrality at each iteration such that two connected components are obtained. The algorithm runs until there is no edge left which implies a worst-case run time complexity of o(m2n).

We also used the Louvain algorithm (Blondel, Guillaume, Lambiotte, and Lefebvre 2008) to obtain communities from a different perspective. It follows a bottom-up approach, where each node is a standalone cluster at the beginning, and they are merged at each step. For merging strategy, Louvain makes use of modularity which measures the strength of division of a network into modules. For a weighted graph with adjacency matrix A, the modularity is defined in Eqn. 2 where m is the sum of all of the edge weights in the graph, δ is Kronecker delta function, ki and kj are the sum of the weights of the edges attached to, and ci and cj are the communities of the nodes i and j.

Q=12mij[Aijkikj2m]δ(ci,cj)whereδ(x,y)={1ifx=y0otherwise} Q = {1 \over {2m}}\sum\limits_{ij} {\left[ {{A_{ij}} - {{{k_i}{k_j}} \over {2m}}} \right]} \delta \left( {{c_i},{c_j}} \right)\;where\;\delta \left( {x,y} \right) = \left\{ {1\;if\;x = y\;\;\;0\;\;\;otherwise} \right\}

As Louvain tries to maximize Q, the following steps are repeated iteratively.

For each node i, calculate the change in modularity when i is removed from its own community and moved into the community of each neighbor j of i. Then, place i into the community that resulted in the greatest modularity increase.

Group all of the nodes in the same community and build a new network such that any links within the same community are represented as self loops and links from multiple nodes in the same community to the same node or different ones in a different community are represented by weighted edges between communities.

Repeat Steps 1 & 2 for the new network.

It should be noted that the Louvain algorithm has a resolution limit. Modularity compares the number of edges in a community with the expected number of edges in the same community in a random network with the same number of nodes with the same degrees. It means that one implicitly assumes that any two nodes in the network can attract each other. When the network is large and nodes only refer to small parts of it, Louvain will continue merging clusters to optimize modularity as discarding cluster specific features and end up with a small number of large communities. This issue in the community resolution could be generalized as follows: the more extended the graph is, the fewer the number of clusters found by Louvain (Cadot, Lelu, and Zi 2018).

While hierarchical clustering approaches, such as Girvan-Newman and Louvain, have their advantages, they assume every node in the network is relevant. Density-based clustering methods focus on local characteristics, take relatively dense locations, and assume other nodes to be outliers to overcome the disadvantages of this assumption. To investigate more on this, we also applied DBSCAN (Ester, Kriegel, Sander, and Xu 1996) for clustering our networks. As a density-based method, DBSCAN requires two parameters: a radius r of the neighborhood and the minimum number of neighbors θ necessary for a node to be considered the seed of a cluster. Within the method, two nodes i and j are density-connected if there is a node k such that both i and j are reachable from k with a distance less than the radius r. If a vertex has more density-connected neighbors than θ, it is an eligible cluster seed. Then, the algorithm merges smaller clusters until each cluster satisfies two properties:

All vertices within a cluster are mutually density-connected.

If a point is density-reachable from some point of the cluster, it is part of the cluster as well.

To summarize, in this study, we apply three different clustering algorithms. This is because these algorithms usually yield rather different results for the same network. Girvan-Newman algorithm the identification of weak ties between groups to separate a network into groups. However, the Louvain algorithm tries to maximize modularity which produces a low Q number when the groups are fuzzy, and a higher Q number when the groups in the network can be called out more clearly. For example, this detection of fuzziness is not something Girvan-Newman algorithm could answer well. DBSCAN is the third algorithm we chose to use because it is able to find arbitrary shaped clusters and clusters with noise (i.e. outliers). Its capabilities to identify and work with outliers helps us to provide a more complete picture of our structure/group analysis.

Results & Discussion
Domestic Player Bias

Domestic player bias (DPB) shows the recruitment policy of national teams, measuring how much they are willing to hire homegrown talents. It depends on the number of players playing in their own country, which are denoted by the weights of the self loops in a network as discussed in Section 3.2. We calculated this metric for all networks, and the average percentage of domestic players for each tournament is presented in Table 1. We observed that the average percentage of domestic players follow a normal distribution (μ = 52.21 and σ = 12.54, Kolmogorov-Smirnov normality score 0.205 with p-value = 0.82551).

Average Percentage of Domestic Players over Different Euro Tournaments

Year Average Percentage of Domestic Players
1992 66.3
1996 69.9
2000 50.9
2004 54.1
2008 52.7
2012 53.5
2016 35.5
2020 34.8

The overall rate of domestic players has decreased over the years as expected due to globalization around the world. This is also in line with the negative correlation between average domestic player percentage and network density (Pearson correlation coefficient ρ = −0.262).

The most significant decrease in DPB happened in Euro 2016, where the number of teams increased from 16 to 24. The increase created an opportunity for some countries that are not regular participants of Euros. This allowed some countries to make their debuts in the tournament, such as Albania in Euro 2016 and Finland in Euro 2020. The addition of these countries results in a decrease in DPB since their leagues are not up to the standards of Europe, and their highly talented national players generally play in clubs outside of their country. The example confirms this theory as Albania only had two domestic players in Euro 2016, and Finland only had one player in Euro 2020.

Country Influence

We provide a summary of top level metrics in Table 2. As we show in Table 2, for the analysis of influential countries, the Eigenvector centrality metric allows us to track the influence of countries progressively so that as a country becomes more significant, its outgoing edges become more important for the network.

Network Metrics

Network # of Nodes Density Diameter Avg. Degree Avg. Path Length Modularity Avg. Eigen Centrality Avg. Clustering Coef.
Euro 1992 18 0.127 2 4.334 1.244 0.406 0.212 0.143
Euro 1996 21 0.171 3 6.857 1.422 0.264 0.180 0.196
Euro 2000 22 0.195 3 8.182 1.429 0.250 0.224 0.247
Euro 2004 23 0.190 4 8.348 1.388 0.229 0.205 0.213
Euro 2008 23 0.200 2 8.782 1.254 0.239 0.206 0.242
Euro 2012 25 0.148 5 7.120 1.603 0.259 0.132 0.199
Euro 2016 41 0.093 6 7.414 2.116 0.317 0.125 0.176
Euro 2020 24 0.286 8 13.167 2.053 0.173 0.206 0.397

We analyzed the network generated for each year and observed that several countries frequently occur at the top of the list according to Eigenvector centrality, also shown in Table 3. The five countries with the highest Eigenvector centralities in Euro 2020, England, Germany, Italy, Spain, and France, which also have the largest node size in Figure 1, are the countries whose leagues are generally referred to as “Big Five” or “Top Five” in Europe (Breaking the lines 2021). These countries frequently have the highest Eigenvector centrality in most analyzed networks. Furthermore, there is a significant gap between the Eigenvector centrality values of the fifth country France (0.76), and the sixth country Turkey (0.13) in the Euro2020 network. This confirms that Eigenvector centrality is a successful choice in identifying the most influential countries in our dataset.

Top Five Countries According to Eigenvector Centrality in Each Tournament

Year Countries
1992 Italy(1.00) England(0.88) France(0.77) Germany(0.27) Scotland (0.08)
1996 Italy(1.00) England(0.51) Scotland(0.44) France(0.41) Turkey (0.30)
2000 Spain(1.00) England(0.96) Italy(0.93) Germany(0.46) France(0.34)
2004 England(1.00) Spain(0.97) Germany(0.90) Italy(0.56) France(0.55)
2008 Germany(1.00) England(0.98) Spain(0.95) Russia(0.39) Italy(0.37)
2012 England(1.00) Italy(0.49) Spain(0.45) Germany(0.32) France(0.17)
2016 England(1.00) Germany(0.75) Italy(0.71) Spain(0.69) Turkey(0.44)
2020 England(1.00) Germany(0.82) Italy(0.81) Spain(0.80) France(0.76)

Figure 1

Euro2020 Network where the node sizes are based on Eigenvector centrality

According to the total economic revenue of the leagues in the 2018/19 season, the five countries that have the highest Eigenvector centrality metric in our analysis are also the top five countries that have the largest economic revenue (Deloitte 2020). The correlation between the financial revenue of the leagues and the Eigenvector centrality value of the countries (ρ = 0.88) can be observed in the case of Russia and Turkey as well, where these countries are sixth and seventh respectively in economic revenue while being seventh and sixth according to Eigenvector centrality metric.

We also observe that although Scotland had a relatively high Eigenvector centrality in 1992 and 1996, its rank dropped in the later years. This might be related to Scotland's political close relationship with England and high domestic player bias during those years.

Country Clusters

Communities in social networks allow us to create large-scale maps of the networks. In the case of football, communities are critical as they can imply economic or cultural interaction patterns among countries. Applying the Girvan-Newman algorithm, we observed that most networks form a giant, central cluster except the network of Euro1992, which is visualized in Figure 2. For this year, Girvan-Newman generated four relatively balanced clusters. It is important to note that the Soviet Union qualified for the final tournament shortly before its dissolution in 1991 and the team still attended the Euro1992 tournament under the name “Commonwealth of Independent States (CIS).” Hence, the clustering pattern in the Euro1992 network approximately shows the attitude of other countries to the USSR. For example, France, England, Denmark, and Turkey are the earliest members of the North Atlantic Treaty Organization (NATO), and they are also included in the same community. Meanwhile, in the UK, Prime Minister Margaret Thatcher was known for her anti-Scottish attitude and policies (Czapiewski 2016). It may be one of the reasons why Scotland is listed with Russia instead of its neighbor England in this network. Girvan-Newman inclines to find a single big community and several significantly smaller ones for networks generated for the later years. An example of this behavior can be seen in Figure 3, where the Girvan-Newman algorithm is applied on the network of Euro 2008. Such a pattern may be due to the absence of political tension as extensive as the Cold War.

Figure 2

Euro1992 Network where nodes are colored with respect to the cluster assignment by Girvan-Newman Algorithm. Each color represents a different cluster.

Figure 3

Euro2008 Network where nodes are colored with respect to the cluster assignment by Girvan-Newman Algorithm. Nodes with pink are assigned to the same cluster whereas the gray nodes are clusters on their own.

On the other hand, the Louvain algorithm results in three to five similarly-sized clusters in all of our networks. In Figure 4, the Figure 2: Euro1992 Network where nodes are colored with respect to the cluster assignment by Girvan-Newman Algorithm. Each color represents a different cluster. The resulting clusters using the Louvain algorithm are given for the network of Euro 2020. Despite its name, the Euro2020 tournament was held in 2021. Therefore, we expect to see several post-pandemic patterns due to COVID-19 and tight restrictions that countries put into effect in the previous years. For example, the average degree of nodes in the Euro 2020 network doubles what it was in 2016. This may be due to the people, including football players, overwhelmed by the restrictions and preference to move across countries to have a sense of freshness. In general, we observe more balanced clusters with the Louvain algorithm compared to the Girvan-Newman algorithm. This may be due to the implicit assumption of each node being attracted by all others, introduced by modularity.

Figure 4

Euro2020 Network where nodes are colored with respect to the cluster assignment by Louvain Algorithm. Each color represents a different cluster.

To observe whether the clusters generated by the Louvain algorithm suffer from the resolution limit, we also obtain clusters using DBSCAN. The cluster assignments given by DBSCAN for the Euro2020 network are visualized in Figure 5. The general structure of the cluster assignments of DBSCAN shows similarity with Girvan-Newman's. DBSCAN also tends to find a single community of considerable size and a higher number of smaller groups. However, when the clusters found for Euro2020 by two algorithms are compared visually, we observed that Girvan-Newman left some of the Big Five countries, Italy and England, alone in the network as shown in Figure 6. Therefore, even though both algorithms give structurally similar clusters, the results of DBSCAN seem to be semantically more accurate.

Figure 5

Euro2020 Network where nodes are colored with respect to the cluster assignment by DBSCAN. Nodes with purplish pink are the assigned to the same cluster whereas the all other colors represent outliers

Figure 6

Euro2020 Network where nodes are colored with respect to the cluster assignment by Girvan-Newman Algorithm. Nodes with pink are assigned to the same cluster whereas the gray nodes are clusters on their own.

Limitations

In this study, we analyze UEFA tournament data from 1992 to 2020 and model this data as a collection of networks where there is a network for every tournament. However, there is a four year gap between each UEFA tournament due to the nature of the UEFA European Football Championship. Hence, our data is not continuous. We note that this time gap is a potential limitation for our study as we require a higher time resolution to conduct dynamic analysis on our network system. Although it is impractical to generate daily or weekly data for football teams, further studies on network models built for different tournaments can be used for interpolation between time points and, hence, enable more comprehensive analyses to be conducted. Another potential limitation of this study is that we only focus on countries in Europe and the historical and political events that affected Europe within the period in which the UEFA European Football Championship has been held. This issue may introduce biases on especially country influence estimations and clusters as we are not considering worldwide relations. As further research directions, we believe it would be beneficial to model and analyze data from several other tournaments with higher country inclusion rates.

Conclusion

In this paper, we analyzed the trends of globalization of football, the most influential leagues and countries in Europe. In particular, we studied modeling the interaction between national squads and the national leagues of the countries that attended the European Football Championship using the social networks we constructed for the last eight UEFA tournaments that are held every four years (e.g. three decades). Although the last eight tournaments do not cover the whole period continuously, they still provide frequent and reliable intervals. Based on our analysis results, we observe that there is a staggering tendency among national players to play in foreign leagues in more recent tournaments. Considering the different clustering algorithms we have used in analyzing our dataset, we recommend the modularity based Louvain algorithm for analysis of similar datasets since we obtained more balanced communities with this algorithm compared to the hierarchical algorithms that yield one giant cluster.

We also analyze the influential countries in European football using Eigenvector centrality and find a correlation between the economic revenue of the leagues and the Eigenvector centrality. Our study confirms the use of Eigenvector centrality to be a correct and generalizable choice of metric as it was successful in correctly identifying the Big-Five of the European Football Championship and matches the official ranking provided by UEFA (Breaking the lines 2021). While our analyses are performed on social networks based on football players and their choices, our results indicate that the approach we have taken is able reveal generalizable sociopolitical and economic inferences about the European countries.

eISSN:
1529-1227
Lingua:
Inglese
Frequenza di pubblicazione:
Volume Open
Argomenti della rivista:
Scienze sociali, altre