With trade barriers being removed and driven by the scientific and technological revolution, the global economic integration based on the Global Value Chain (GVC) is already established. In order to analyze various economic phenomena happened within and across countries or regions on the GVC, it is very important to find and adopt appropriate database to illustrate the intricate intersector economic activities. In recent years, with the penetration of Complex Network theory in various disciplines, scholars found that the matrix structure of InputOutput (IO) data can be directly or slightly modified to be the adjacency matrix of the network models, which will be the basis of Social Network Analysis (SNA). Unfortunately, lots of related studies were negatively influenced by the features of IO network, which are the highly dense structure and the heterogeneous edge weight distribution. For instance, the degree of each node is the same (equal to the maximum), so as the between centrality, closeness centrality, clustering coefficient, etc. Therefore, we need to propose effective methods to extract the backbone of IO network, by reducing the scale of network and retaining only the meaningful connections.
Ideally, we hope that the crucial information of network could be retained as much as possible with the decline of number of edges. Sometimes, a weighted edge is numerically insignificant but functionally significant, and reckless deletion will result in a useless broken structure. Therefore, the effective network backbone extraction method has to be a tradeoff between the number of retained edges and the gross of retained weights.
In this paper, considering that the edges of the IO networks represent the flow of bilateral economic activities, we obtain the IO network backbones by retaining the Strongest Relevance Path Length (SRPL) (Xing et al., 2019) between industry sectors, which is called the Searching Paths (SP) method. In addition, there is an unequal relationship between inputs and outputs. We propose a new heuristic filtering algorithm Filtering Edges (FE) based on the assumption that a minority of edges hold the most weights in the network. The SP method and FE method come from global and local perspectives respectively, and both of them can converge naturally through algorithms rather than rely on artificial parameters.
The remainder of this paper is organized as follows: Section 2 contains the literature review and the shortcomings of the present study. Section 3 introduces the two methods and their mixed strategies, as well as the specific implementation process of network pruning. Section 3 describes the data and modeling process. Section 4 compares the method proposed in this paper with the existing classical methods. Section 5 is the conclusions, in which the main conclusions and further research direction of this paper are summarized.
The problems that higher network density causes to SNA are not unique to IO network. For example, regionally clustered edges pose challenges to explicitly depict the overall structure of geographical network (Hennemann, 2013). Higher network density is a serious obstacle for cluster and spectrum analyses, which are usually applied in the sparse networks. Therefore, scholars in different fields including sociology (Nick et al., 2013), economics (Marotta et al., 2015), and computer science (Foti, Hughes, & Rockmore, 2011) proposed a series of methods for extracting network backbones. In detail, these methods can be classified into three categories—network sampling, coarse graining, and filtering. First, Network sampling means obtaining the main network structure through sampling representative nodes or edges, which is extensively overviewed in the study of Ahmed, Neville, and Kompella (2014). Second, to turn the original network into a smaller skeleton, coarse graining means merging nodes or edges according to certain rules (Itzkovitz et al., 2004), such as the attribute of node or edge (Toivonen et al., 2012), community structure (Blagus et al., 2012). The most common one is the filterbased method. It typically defines a criterion for a node or edge to examine their importance or relevance to the network, after which the redundant ones are removed in a stepwise procedure (Dai, Derudder, & Liu, 2018). In practice, there are two filtering perspectives.
An important angle is to preserve the core by removing structures that are under an overall topological characteristic, which can be called the global method. For nodes, a range of indices such as degree (Carmi et al., 2007; Kitsak et al., 2010; Siganos, Tauro, & Faloutsos, 2006), local centrality (Chen et al., 2012) and Leader Rank (Li et al., 2014; Lv et al., 2011) have been applied to identify influential nodes. Further, backbone network can be obtained by reserving the smallest possible subset of highly influential nodes and their relevant links (Malang et al., 2020). As for edges, link weights (Zhang & Zhu, 2013; Zhang et al., 2018; Zhao et al., 2014) and network motifs (Cao, Ding, & Shi, 2019) were used to extract functional backbones while edge betweenness (Kim, Noh, & Jeong, 2004) and the shortest path (Grady, Thiemann, & Brockmann, 2011; Zhang et al., 2014) were used to preserve more structural features. Among them, the simplest method is to remove all edges with weight below a certain threshold, i.e. the Global Threshold (GT) method. Zhang et al. defined hbackbone method that extract both the highstrength links and network bridges, which can preserve important information about both network function and structure (Zhang, Stanley, & Ye, 2018).
The other is to retain the components which are statistically significant at the local level. Disparity filter (DF) is the most representative method that focuses on the local distribution of edge weights, preserving the edges that represent statistically significant deviations with respect to a null model (Serrano, Boguna, & Vespignani, 2009). The improvement of DF comes from different aspects (Coscia & Neffke, 2017). For example, Radicchi, Ramasco, & Fortunato, proposed a weightfiltering technique based on a global null model, which accounts for the full topology while preserving the heterogeneity of the weight distribution (Radicchi, Ramasco, and Fortunato, 2011). Foti, Hughes, and Rockmore presented an improvement relying solely on the empirical distribution to judge statistical significance instead of any null model (Foti, Hughes, & Rockmore, 2011). Bu et al. designed a stricter filter rule and an iterative local search model to avoid outliers and improve efficiency (Bu et al., 2014). Zhang et al. and Cao, J., Ding, C. and Shi, B. identified which edges for each node should be preserved by evaluating whether or not their link involvement or motif wights are compatible with the null model respectively (Cao, Ding, & Shi, 2019; Zhang et al., 2011).
However, most of the previous studies on backbone extraction have the following aspects to be further considered: 1) Most of the proposed methods are general without considering the actual meaning of edge weight. The distribution of industrial relationship in international trade networks, such as IO network, is obviously unequal or heterogeneous. 2) Artificial parameters are required in many methods, which make the backbone network not unique. 3) Both global and local perspectives have their disadvantages. The global methods may overlook the local heterogeneity and multiscale structures of the network, and the local methods may ignore “the crucial weak ties” (Granovetter, 1973) that connect different communities. For these reasons, we need backbone extraction methods that are sensitive to the heterogeneity of weights and independent of parameters.
According to the comparison between real IO relations and SRPL paths in Xing and Han (2021), we propose a global backbone extraction method is efficiently to reserving the main industrial relevance with the fastest reaction speed and the most frequent traffic.
When the corresponding elements of the new matrix
Taking
Enlightened by HIndex (Hirsch, 2005) and the Pareto Principle, we present a novel heuristic algorithm to effectively prune the dense and weighted IO network, which is named XIndex Filtering Algorithm (XIFA). XIFA algorithm is a sort of mixed quantitative index that takes into account the scope and intensity of industrial sectors and extracts the main topology of IO network according to the heterogeneity of IO relations. From the perspective of complex networks, for the nodes owning diversely weighted edges, we only need to retain a small number of them with extremely large weights, while for the nodes owning similarly weighted edges, many but no more than 50% of them will stay, as shown in Figure 1.
In addition to the case in Figure 1 (a), the larger the Xindex, the more even distribution of edge weights, and the more edges are retained; the smaller the Xindex, the more concentrated distribution of edge weights, and the fewer edges are retained. The range of the Xindex is theoretically (0,0.5]. The core idea of XIFA is as follows: An industrial sector has backward index
The procedure of network pruning based on XIFA.
Procedure  Column Deletion of Input Relations  Row Deletion of Output Relations 

Network  
Refactoring 


Conditions 


Definition 


Pruning 


Merging 


Result 

Note: “descend ()” means to arrange the vectors in descending order.
The basic information of ICIO databases.
Database  Version  Time Span  Country / Region  Industrial Sector  Abbr 

WIOD  2016 Release  2000–2014  44  56  WIOD2016 
2013 Release  1995–2011  41  35  WIOD2013  
OECDTiVA  2020 Release  1995–2018  Unknow  Unknow  TiVA2020 
2018 Release  2005–2015  65  36  TiVA2018  
2016 Release  1995–2011  64  34  TiVA2016  
2015 Release  1995, 2000, 2005, 2008–2011  62  34  TiVA2015  
EoraMRIO  V199.82  1990–2015  189  26  Eora26 
ADBMRIO  Updated to 2019  2000, 2007–2019  63  35  ADB2019 
Note: In addition to sovereign states, the Rest of World (RoW) is taken as an independent economic entity in WIOD, OECD and ADB, most of which belong to the developing countries; In TiVA2018, for “intermediates”, “value added” and “output”, data for Mexico and China are split into MX1, MX2 and CN1, CN2, respectively.
This algorithm serves a dual purpose. One is to retain the key outward connections from the perspective of provider, and the other is to retain the key inward connections from the perspective of consumer. It should be noted that if all the key exportoriented links cannot touch the industrial sectors of some weak countries as consumers, then these sectors will be separated from the GVC, which undermines the integrity of GVC. Therefore, we carry out the merge after network pruning from the perspective of out degree and indegree respectively. However, according to the nature of this algorithm, this pruning method is still based on the relations of adjacent nodes, rather than global information, so it is still possible to ignore the “critical” and “weak” edges.
In consideration of balancing the overall and local, the preferable algorithm in theory for extracting the backbone of GVC with enough structural and functional information has three steps.
Step 1. Find the edges acting as crucial bridges between nodes or communities based on the RFWA.
Step 2. Find the edges retaining the great majority of weights based on the XIFA.
Step 3. Merge the edges of Step 1 and 2 and add their adjacent nodes.
To illustrate this mixed strategy, Figure 2 (a) shows a fivenode directed and weighted network with selfloops, in which the larger the edge weight, the closer the asymmetric interactions. When conducting the SP method, we search globally for key chains seeing both the efficiency and effectiveness. An edge will be preserved if it happens to be located on the most direct path between two nodes and the strongest relevance path as well (see Figure 2 (b)). The FE method, from the perspective of local information, retains the pivotal links that play an important role in the adjacent edges of each node, by examining them in two ways according to the XIFA (see Figure 2 (c)). In this case, the mixed strategy, denoted by SPFE, increases another two edges on the basis of the FEbased backbone (see Figure 2 (d)).
With the availability and utilization of global IO database, especially InterCountry InputOutput (ICIO) tables, relevant research is no longer limited to the economic system of a single country, but extends to interregional and even global economic systems. Typical IO or ICIO table includes three different areas, namely valueadded (
To embody the nonlinear relations between worldwide industrial sectors, we adopt the intermediate use part of ICIO table as the basis of network modeling. In this ICIO network, a sector within a country or region is considered as a node and the interindustry IO relation as an edge, and its edge weight represents the sale and purchase relations between producers and consumers, quantified by cash flux. Thus, a graph
By the comparison of the main ICIO databases available for now, the year of 2014 as the latest and the greatest common divisor exists in World InputOutput Database (WIOD), OECDWTO Database on Trade in ValueAdded (OECDTiVA), Eora MultiRegion InputOutput Table Database (EoraMRIO) and Asian Development Bank Multi Regional InputOutput Tables (ADBMRIO), so we take them as the basis of modeling and empirical analysis. Two of them, WIOD and TiVA, provide more granular sectoral information and are more rooted in official statistics, and are widely used to exploit the analytical richness of tradefocused empirical analysis. Although Eora covers 189 economies, the sectoral data provided is highly aggregated. ADB, as an extension to WIOD, facilitates the production and analysis of GVCrelated statistics for more Asian economies and becomes a rich source of economic information for research and policymaking.
To facilitate the calculation and visualization in empirical analysis, sectors are further merged into four categories, including Agriculture (SC1), Mining (SC2), Manufacturing (SC2) and Services (SC4). The four GIVCN models based on the ICIO databases of the latest version are hence named “GIVCNWIODSC42014”, “GIVCNTiVASC42014”, “GIVCNEoraSC42014” and “GIVCNADBSC42014” respectively, and statistics for the structural properties of full networks includes the number of edges 
Illustration of properties of the original networks.
Networks   
〈 
〈 
〈 

GIVCNWIODSC42014  30,795 (30,976)  1.006 (1.000)  174.972 (176.000)  1.000 (1.000) 
GIVCNTiVASC42014  66,600 (67,600)  1.007 (1.000)  257.143 (260.000)  0.996 (1.000) 
GIVCNEoraSC42014  571,536 (571,536)  1.000 (1.000)  756.000 (756.000)  1.000 (1.000) 
GIVCNADBSC42014  57,761 (63,504)  1.069 (1.000)  231.972 (252.000)  0.971 (1.000) 
If the information of edge weight is not considered, the GIVCN model has a high density of connections, which may give it access to the shorter average path and the greater clustering coefficient (see Table 3). Therefore, both of them are strong evidence that this model is a smallworld network. If the weight is considered, the heavy tails in Figure 4 reveal that the distributions of edge weight are extremely heterogeneous. Most IO relations carry on very little value flow, while some pass massive intermediate goods on the GVC. This phenomenon reflects the GIVCN model is a sort of scalefree network in nature. In pruning, the above two points require us to consider both the removal of redundancy and the retention of weights.
Three backbone extraction methods are applied to the “GIVCN” models together with the simplest GT method and the classical DF method. Despite the subnetworks still look a bit like a ball of yarn, it is possible to analyze the heterogeneous characteristics of global production system based on the simplified topological structure (see Appendix). Given that
Take the GIVCNEoraSC42014 model as an example, in the GT method, the minimum threshold is one tenthousandth of the maximum edge weight, which is equal to 9.728E+05, and the second smallest one is 1.946E+06, and so on. Then, each threshold is selected in order from small to large to conduct network pruning. In other words, the parameters in the GT method can be regarded as specific removal rules. DF method computes the probability (
Table 4 reports the structural similarity of different backbone networks in terms of the percentage of nonisolated nodes (
Structural similarity of various backbones and original network.
Networks  QAP  

GT  DF  SP  FE  SPFE  GT  DF  SP  FE  SPFE  
GIVCNWIODSC42014  88.068  100.000  100.000  100.000  100.000  1.000  0.998  0.994  1.000  1.000 
GIVCNTiVASC42014  91.539  99.615  99.615  99.615  99.615  1.000  0.991  0.996  1.000  1.000 
GIVCNEoraSC42014  65.079  100.000  100.000  100.000  100.000  1.000  1.000  0.999  1.000  1.000 
GIVCNADBSC42014  77.778  98.413  98.810  98.810  98.810  1.000  0.998  0.996  1.000  1.000 
The number of residual edges is recorded in the new network and the percentage of the residual edge weights for the original network is shown in Figure 6. Take the GIVCNEoraSC42014 model as an example, in the network extracted at the global threshold of 9.728E+05 (one tenthousandth of the maximum edge weight), the number of edges quickly decreases from 571,536 to 2,779 and the decline rate reaches 99.510%, while the total edge weight of the network decreases by only 3.370%. Obviously, setting thresholds is acceptable if following the principle of “significantly reducing the number of edges while retaining most of the total edge weight”. The edge filtering procedure retains more edges than setting global thresholds, and the proportion of residual edge weights of the former with
Moreover, we propose the Average Edge Weight (
A thousand times comparisons between each method and null model are shown by boxplot graph in Figure 7. Although all of them precede their null models, the SP method as the best and the most consistent one manifests it catches a small number of critical paths but with extremely high edge weights. In conclusion, among the two parameterless methods, the global SP method can preserve the structural integrity of the network well, but the entire robustness and local function of network are inevitably influenced since it yields so few links. FE method can basically embrace SP and supplement more weight information, and thus has more significance in extracting the backbone of ICIO network.
Modeling and analyzing complex systems based on Complex Network theory has become a research hotspot. With the advent of the era of big data, the rapid expansion of network scale is becoming a new challenge for network science research. Backbone extraction is an adequate tool for studying large network. It is used to detect significant edges and obtain a sparser network with only the crucial information. Different from the existing methods, both the heterogeneity of weights and denseness of ICIO network are considered in this paper. We strive to propose solutions that do not rely on artificial parameters and find a better one between the global and local methods.
The SP algorithm measures the strength of relations across or within sectors from the global perspective of IVC, retaining the links with closer IO relations and lower transaction costs. By combining Hindex and Pareto rule, the FE algorithm performs the filtering process at the nodelocal level and converges naturally without depending on parameters. Through comparison between the two and existing methods, it is proved that the SP algorithm is already an efficient and effective tool, which can not only keep the structural similarity with the original network on the global level but also ensure that retained unit edge carries enough information. However, highefficiency redundancy removal makes it tortuous to reach some marginal nodes inevitably. The FE algorithm covers the results of SP, while retaining higher weight on the premise of the number of remaining edges is close to other methods. Therefore, the FE algorithm can basically replace the mixed strategy and become the better of the two proposed methods.
In addition, the two special network pruning algorithms are useful to explore more possibilities for the study of highly dense and heterogeneously weighted networks, such as information networks, communication networks, citation networks, social media networks, etc. Also, except for visualization and network topology analysis, the obtained sparse network backbone will promote the relevant analyses in terms of community detection, link prediction, spatial econometrics, network embedding edge2eve, etc.
Structural similarity of various backbones and original network.
Networks  QAP  

GT  DF  SP  FE  SPFE  GT  DF  SP  FE  SPFE  
GIVCNWIODSC42014  88.068  100.000  100.000  100.000  100.000  1.000  0.998  0.994  1.000  1.000 
GIVCNTiVASC42014  91.539  99.615  99.615  99.615  99.615  1.000  0.991  0.996  1.000  1.000 
GIVCNEoraSC42014  65.079  100.000  100.000  100.000  100.000  1.000  1.000  0.999  1.000  1.000 
GIVCNADBSC42014  77.778  98.413  98.810  98.810  98.810  1.000  0.998  0.996  1.000  1.000 
Illustration of properties of the FE backbone.
Database   
〈 
〈 
〈 

WIOD  3,825  1.9275  20.8125  0.6231 
TiVA  7,477  1.9451  27.8764  0.6258 
Eora  54,747  1.9504  71.4762  0.5985 
ADB  6,288  1.9675  24.3253  0.616 
Illustration of properties of the original networks.
Networks   
〈 
〈 
〈 

GIVCNWIODSC42014  30,795 (30,976)  1.006 (1.000)  174.972 (176.000)  1.000 (1.000) 
GIVCNTiVASC42014  66,600 (67,600)  1.007 (1.000)  257.143 (260.000)  0.996 (1.000) 
GIVCNEoraSC42014  571,536 (571,536)  1.000 (1.000)  756.000 (756.000)  1.000 (1.000) 
GIVCNADBSC42014  57,761 (63,504)  1.069 (1.000)  231.972 (252.000)  0.971 (1.000) 
Illustration of properties of the SPFE backbone.
Database   
〈 
〈 
〈 

WIOD  3,825  1.9275  20.8125  0.6231 
TiVA  7,477  1.9451  27.8764  0.6258 
Eora  54,753  1.9485  71.4841  0.5993 
ADB  6,288  1.9675  24.3253  0.616 
Illustration of properties of the GT backbone.
Database   
〈 
〈 
〈 

WIOD  2,009  5.4774  6.9659  0.5291 
TiVA  3,211  4.3517  8.5598  0.4798 
Eora  2,779  4.1916  43.8587  0.4397 
ADB  2,361  4.2087  8.0484  0.4858 
The procedure of network pruning based on XIFA.
Procedure  Column Deletion of Input Relations  Row Deletion of Output Relations 

Network  
Refactoring 


Conditions 


Definition 


Pruning 


Merging 


Result 

Illustration of properties of the SP backbone.
Database   
〈 
〈 
〈 

WIOD  533  4.8292  2.3864  0.4569 
TiVA  806  4.9457  2.4093  0.3788 
Eora  2,496  5.1258  2.6481  0.5381 
ADB  752  4.7738  2.3695  0.4463 
The basic information of ICIO databases.
Database  Version  Time Span  Country / Region  Industrial Sector  Abbr 

WIOD  2016 Release  2000–2014  44  56  WIOD2016 
2013 Release  1995–2011  41  35  WIOD2013  
OECDTiVA  2020 Release  1995–2018  Unknow  Unknow  TiVA2020 
2018 Release  2005–2015  65  36  TiVA2018  
2016 Release  1995–2011  64  34  TiVA2016  
2015 Release  1995, 2000, 2005, 2008–2011  62  34  TiVA2015  
EoraMRIO  V199.82  1990–2015  189  26  Eora26 
ADBMRIO  Updated to 2019  2000, 2007–2019  63  35  ADB2019 
Illustration of properties of the DF backbone.
Database   
〈 
〈 
〈 

WIOD  1,385  5.4774  6.9659  0.5291 
TiVA  2,471  4.3517  8.5598  0.4798 
Eora  33,289  4.1916  43.8587  0.4397 
ADB  2,224  4.2087  8.0484  0.4858 