Parameterless Pruning Algorithms for Similarity-Weight Network and Its Application in Extracting the Backbone of Global Value Chain

With trade barriers being removed and driven by the scientific and technological revolution, the global economic integration based on the Global Value Chain (GVC) is already established. In order to analyze various economic phenomena happened within and across countries or regions on the GVC, it is very important to find and adopt appropriate database to illustrate the intricate inter-sector economic activities. In recent years, with the penetration of Complex Network theory in various disciplines, scholars found that the matrix structure of Input-Output (IO) data can be directly or slightly modified to be the adjacency matrix of the network models, which will be the basis of Social Network Analysis (SNA). Unfortunately, lots of related studies were negatively influenced by the features of IO network, which are the highly dense structure and the heterogeneous edge weight distribution. For instance, the degree of each node is the same (equal to the maximum), so as the between centrality, closeness centrality, clustering coefficient, etc. Therefore, we need to propose effective methods to extract the backbone of IO network, by reducing the scale of network and retaining only the meaningful connections.

Ideally, we hope that the crucial information of network could be retained as much as possible with the decline of number of edges. Sometimes, a weighted edge is numerically insignificant but functionally significant, and reckless deletion will result in a useless broken structure. Therefore, the effective network backbone extraction method has to be a trade-off between the number of retained edges and the gross of retained weights.

In this paper, considering that the edges of the IO networks represent the flow of bilateral economic activities, we obtain the IO network backbones by retaining the Strongest Relevance Path Length (SRPL) (Xing et al., 2019) between industry sectors, which is called the Searching Paths (SP) method. In addition, there is an unequal relationship between inputs and outputs. We propose a new heuristic filtering algorithm Filtering Edges (FE) based on the assumption that a minority of edges hold the most weights in the network. The SP method and FE method come from global and local perspectives respectively, and both of them can converge naturally through algorithms rather than rely on artificial parameters.

The remainder of this paper is organized as follows: Section 2 contains the literature review and the shortcomings of the present study. Section 3 introduces the two methods and their mixed strategies, as well as the specific implementation process of network pruning. Section 3 describes the data and modeling process. Section 4 compares the method proposed in this paper with the existing classical methods. Section 5 is the conclusions, in which the main conclusions and further research direction of this paper are summarized.

Literature review

The problems that higher network density causes to SNA are not unique to IO network. For example, regionally clustered edges pose challenges to explicitly depict the overall structure of geographical network (Hennemann, 2013). Higher network density is a serious obstacle for cluster and spectrum analyses, which are usually applied in the sparse networks. Therefore, scholars in different fields including sociology (Nick et al., 2013), economics (Marotta et al., 2015), and computer science (Foti, Hughes, & Rockmore, 2011) proposed a series of methods for extracting network backbones. In detail, these methods can be classified into three categories—network sampling, coarse graining, and filtering. First, Network sampling means obtaining the main network structure through sampling representative nodes or edges, which is extensively overviewed in the study of Ahmed, Neville, and Kompella (2014). Second, to turn the original network into a smaller skeleton, coarse graining means merging nodes or edges according to certain rules (Itzkovitz et al., 2004), such as the attribute of node or edge (Toivonen et al., 2012), community structure (Blagus et al., 2012). The most common one is the filter-based method. It typically defines a criterion for a node or edge to examine their importance or relevance to the network, after which the redundant ones are removed in a stepwise procedure (Dai, Derudder, & Liu, 2018). In practice, there are two filtering perspectives.

An important angle is to preserve the core by removing structures that are under an overall topological characteristic, which can be called the global method. For nodes, a range of indices such as degree (Carmi et al., 2007; Kitsak et al., 2010; Siganos, Tauro, & Faloutsos, 2006), local centrality (Chen et al., 2012) and Leader Rank (Li et al., 2014; Lv et al., 2011) have been applied to identify influential nodes. Further, backbone network can be obtained by reserving the smallest possible subset of highly influential nodes and their relevant links (Malang et al., 2020). As for edges, link weights (Zhang & Zhu, 2013; Zhang et al., 2018; Zhao et al., 2014) and network motifs (Cao, Ding, & Shi, 2019) were used to extract functional backbones while edge betweenness (Kim, Noh, & Jeong, 2004) and the shortest path (Grady, Thiemann, & Brockmann, 2011; Zhang et al., 2014) were used to preserve more structural features. Among them, the simplest method is to remove all edges with weight below a certain threshold, i.e. the Global Threshold (GT) method. Zhang et al. defined h-backbone method that extract both the high-strength links and network bridges, which can preserve important information about both network function and structure (Zhang, Stanley, & Ye, 2018).

The other is to retain the components which are statistically significant at the local level. Disparity filter (DF) is the most representative method that focuses on the local distribution of edge weights, preserving the edges that represent statistically significant deviations with respect to a null model (Serrano, Boguna, & Vespignani, 2009). The improvement of DF comes from different aspects (Coscia & Neffke, 2017). For example, Radicchi, Ramasco, & Fortunato, proposed a weight-filtering technique based on a global null model, which accounts for the full topology while preserving the heterogeneity of the weight distribution (Radicchi, Ramasco, and Fortunato, 2011). Foti, Hughes, and Rockmore presented an improvement relying solely on the empirical distribution to judge statistical significance instead of any null model (Foti, Hughes, & Rockmore, 2011). Bu et al. designed a stricter filter rule and an iterative local search model to avoid outliers and improve efficiency (Bu et al., 2014). Zhang et al. and Cao, J., Ding, C. and Shi, B. identified which edges for each node should be preserved by evaluating whether or not their link involvement or motif wights are compatible with the null model respectively (Cao, Ding, & Shi, 2019; Zhang et al., 2011).

However, most of the previous studies on backbone extraction have the following aspects to be further considered: 1) Most of the proposed methods are general without considering the actual meaning of edge weight. The distribution of industrial relationship in international trade networks, such as IO network, is obviously unequal or heterogeneous. 2) Artificial parameters are required in many methods, which make the backbone network not unique. 3) Both global and local perspectives have their disadvantages. The global methods may overlook the local heterogeneity and multiscale structures of the network, and the local methods may ignore “the crucial weak ties” (Granovetter, 1973) that connect different communities. For these reasons, we need backbone extraction methods that are sensitive to the heterogeneity of weights and independent of parameters.

Methodology

3.1

Searching paths

According to the comparison between real IO relations and SRPL paths in Xing and Han (2021), we propose a global backbone extraction method is efficiently to reserving the main industrial relevance with the fastest reaction speed and the most frequent traffic. (1) $SRP L_{ij}^{(k)} = max_{i, j, k \in {1, 2, \dots, N}} {w_{ij}^{(k - 1)}, \frac{w_{ik}^{(k - 1)} w_{kj}^{(k - 1)}}{w_{ik}^{(k - 1)} + w_{kj}^{(k - 1)}}}$ SRPL_{ij}^{\left( k \right)} = \mathop {\max }\limits_{i,j,k \in \left\{ {1,2, \ldots ,N} \right\}} \left\{ {w_{ij}^{\left( {k - 1} \right)},{{w_{ik}^{\left( {k - 1} \right)}w_{kj}^{\left( {k - 1} \right)}} \over {w_{ik}^{\left( {k - 1} \right)} + w_{kj}^{\left( {k - 1} \right)}}}} \right\} where $\frac{w_{ik}^{(k - 1)} w_{kj}^{(k - 1)}}{w_{ik}^{(k - 1)} + w_{kj}^{(k - 1)}}$ {{w_{ik}^{\left( {k - 1} \right)}w_{kj}^{\left( {k - 1} \right)}} \over {w_{ik}^{\left( {k - 1} \right)} + w_{kj}^{\left( {k - 1} \right)}}} , namely Relevance Path Length (RPL) uses the edge weight product to measure the impact from the industrial structure, and uses the edge weight sum to provide distance information and serve as a barrier function. If it is greater than $w_{ij}^{(k - 1)}$ w_{ij}^{\left( {k - 1} \right)} , we keep the record as it is, otherwise just equalize it to $w_{ij}^{(k - 1)}$ w_{ij}^{\left( {k - 1} \right)} . $SRP L_{ij}^{(k)}$ SRPL_{ij}^{(k)} is the SRPL between nodes i and j, representing the shortest Industrial Value Chain (IVC) with each link between the two industrial sectors closely connected, which achieves convergence through iteration of Revised Floyd-Warshall Algorithm (RFWA) algorithm (Xing et al., 2019). According to the Eq. (1), the SRPL of any two nodes in the global network can be calculated, and finally a new matrix is formed: (2) $SRP L^{(N)} = (\begin{matrix} SRP L_{11}^{(N)} & \dots & SRP L_{N 1}^{(N)} \\ ⋮ & ⋱ & ⋮ \\ SRP L_{N 1}^{(N)} & \dots & SRP L_{NN}^{(N)} \end{matrix})$ SRP{L^{(N)}} = \left( {\matrix{ {SRPL_{11}^{(N)}} & \ldots & {SRPL_{N1}^{(N)}} \cr \vdots & \ddots & \vdots \cr {SRPL_{N1}^{(N)}} & \ldots & {SRPL_{NN}^{(N)}} \cr } } \right)

When the corresponding elements of the new matrix SRPL′ and the edge weight matrix W are exactly equal, it means that the optimal path is the most direct propagation path between the two nodes, that is, there is no need to pass through another node. From the perspective of GVC, we believe that between specific upstream and downstream sectors, whether it is value stream or material stream, it tends to travel along the SRPL path rather than in a random manner.

(3)

{\tilde{w}}_{ij} = {\begin{matrix} w_{ij} & , & w_{ij} = SRP L_{ij}^{(N)} \\ 0 & , & otherwise \end{matrix}

{\tilde w_{ij}} = \left\{ {\matrix{ {{w_{ij}}} & , & {{w_{ij}} = SRPL_{ij}^{(N)}} \cr 0 & , & {otherwise} \cr } } \right.

Taking $\tilde{W} = {{\tilde{W}}_{ij}}$ \tilde W = \left\{ {{{\tilde W}_{ij}}} \right\} as an adjacency matrix, we will acquire a new network model. According to Burt's Structural Holes theory, brokers on the propagation paths have gained two types of advantages, namely, greater information advantages and more intermediate benefits (Burt, 1992). There is no doubt that too many brokers will damage the efficiency of the network. Therefore, the strong correlation in the industrial complex network can be extracted by retaining the direct IO relationship that carries the inter-industry intermediate products.

3.2

Filtering edges

Enlightened by H-Index (Hirsch, 2005) and the Pareto Principle, we present a novel heuristic algorithm to effectively prune the dense and weighted IO network, which is named X-Index Filtering Algorithm (XIFA). XIFA algorithm is a sort of mixed quantitative index that takes into account the scope and intensity of industrial sectors and extracts the main topology of IO network according to the heterogeneity of IO relations. From the perspective of complex networks, for the nodes owning diversely weighted edges, we only need to retain a small number of them with extremely large weights, while for the nodes owning similarly weighted edges, many but no more than 50% of them will stay, as shown in Figure 1.

Three possible situations in the application of XIFA algorithm. Note: (a) The source node has only one weighted edge connected to it, and 100% of its strength is allocated on it; (b) Top 20% of weighted edges carry 80% of the strength of source node. (c) Any 50% of weighted edges carry 50% of the strength of source node.

In addition to the case in Figure 1 (a), the larger the X-index, the more even distribution of edge weights, and the more edges are retained; the smaller the X-index, the more concentrated distribution of edge weights, and the fewer edges are retained. The range of the X-index is theoretically (0,0.5]. The core idea of XIFA is as follows: An industrial sector has backward index x (XI^B) / forward index x (XI^F) if top x percent of its relations to all upstream/downstream sectors occupy at least (1 − x%) of its total input/output amount of intermediate goods (Guan, Ren, & Xing, 2021). Therefore, when performing network pruning, we retain only the edge of the top XI^B present or XI^F present. Formula deduction and practical calculation procedures are presented in Table 1.

Table 1

The procedure of network pruning based on XIFA.

Procedure	Column Deletion of Input Relations	Row Deletion of Output Relations
Network	W = (w_ij)_N×N i, j ∈ [1, N]
Refactoring	$\begin{matrix} {\overset{↼}{w}}_{1} = descend {(w_{11}, w_{21}, \dots, w_{N 1})}^{T} \\ {\overset{↼}{w}}_{2} = descend {(w_{12}, w_{22}, \dots, w_{N 2})}^{T} \\ \dots \\ {\overset{↼}{w}}_{N} = descend {(w_{1 N}, w_{2 N}, \dots, w_{NN})}^{T} \\ \overset{↼}{W} = ({\overset{↼}{w}}_{1}, {\overset{↼}{w}}_{2}, \dots, {\overset{↼}{w}}_{N}) = {({\overset{↼}{w}}_{sj})}_{N \times N} \\ s, j \in [1, N] \end{matrix}$ \matrix{ {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_1} = descend\,{{\left( {{w_{11}},{w_{21}}, \cdots ,{w_{N1}}} \right)}^T}} \cr {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_2} = descend\,{{\left( {{w_{12}},{w_{22}}, \cdots ,{w_{N2}}} \right)}^T}} \cr \cdots \cr {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_N} = descend\,{{\left( {{w_{1N}},{w_{2N}}, \cdots ,{w_{NN}}} \right)}^N}} \cr {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over W} = \left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_1},{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_2}, \cdots ,{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_N}} \right) = {{\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_{sj}}} \right)}_{N \times N}}} \cr {s,j \in \left[ {1,N} \right]} \cr }	$\begin{matrix} {\overset{⇀}{w}}_{1} = descend (w_{11}, w_{12}, \dots, w_{1 N}) \\ {\overset{⇀}{w}}_{2} = descend (w_{21}, w_{22}, \dots, w_{2 N}) \\ \dots \\ {\overset{⇀}{w}}_{N} = descend (w_{N 1}, w_{N 2}, \dots, w_{NN}) \\ \overset{⇀}{W} \overset{\leftarrow}{} = (\begin{matrix} {\overset{⇀}{w}}_{1} \\ {\overset{⇀}{w}}_{2} \\ \dots \\ {\overset{⇀}{w}}_{N} \end{matrix}) = {({\overset{⇀}{w}}_{it})}_{N \times N} \\ i, t \in [1, N] \end{matrix}$ \matrix{ {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_1} = descend\,\left( {{w_{11}},{w_{12}}, \cdots ,{w_{1N}}} \right)} \cr {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_2} = descend\,\left( {{w_{21}},{w_{22}}, \cdots ,{w_{2N}}} \right)} \cr \cdots \cr {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_N} = descend\,\left( {{w_{N1}},{w_{N2}}, \cdots ,{w_{NN}}} \right)} \cr {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over W} \overleftarrow {} = \left( {\matrix{ {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_1}} \cr {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_2}} \cr \ldots \cr {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_N}} \cr } } \right) = {{\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_{it}}} \right)}_{N \times N}}} \cr {i,t \in \left[ {1,N} \right]} \cr }
Conditions	$\begin{matrix} \forall a_{1}, a_{2}, \dots, a_{j} \in [1, N] \\ {\begin{matrix} \frac{\sum_{s = 1}^{a_{j}} {\overset{↼}{w}}_{sj}}{\sum_{s = 1}^{N} {\overset{↼}{w}}_{sj}} \geq 1 - \frac{a_{j}}{N} \\ \frac{\sum_{s = 1}^{a_{j} - 1} {\overset{↼}{w}}_{sj}}{\sum_{s = 1}^{N} {\overset{↼}{w}}_{sj}} < 1 - \frac{a_{j} - 1}{N} \end{matrix} \end{matrix}$ \matrix{ {\forall {a_1},{a_2}, \cdots ,{a_j} \in \left[ {1,N} \right]} \cr {\left\{ {\matrix{ {{{\sum\nolimits_{s = 1}^{{a_j}} {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_{sj}}} } \over {\sum\nolimits_{s = 1}^N {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_{sj}}} }} \ge 1 - {{{a_j}} \over N}} \cr {{{\sum\nolimits_{s = 1}^{{a_j} - 1} {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_{sj}}} } \over {\sum\nolimits_{s = 1}^N {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_{sj}}} }} < 1 - {{{a_j} - 1} \over N}} \cr } } \right.} \cr }	$\begin{matrix} \forall b_{1}, b_{2}, \dots, b_{j} \in [1, N] \\ {\begin{matrix} \frac{\sum_{t = 1}^{b_{i}} {\overset{⇀}{w}}_{it}}{\sum_{t = 1}^{N} {\overset{⇀}{w}}_{it}} \geq 1 - \frac{b_{i}}{N} \\ \frac{\sum_{t = 1}^{b_{i} - 1} {\overset{⇀}{w}}_{it}}{\sum_{t = 1}^{N} {\overset{⇀}{w}}_{it}} < 1 - \frac{b_{i} - 1}{N} \end{matrix} \end{matrix}$ \matrix{ {\forall {b_1},{b_2}, \cdots ,{b_j} \in \left[ {1,N} \right]} \cr {\left\{ {\matrix{ {{{\sum\nolimits_{t = 1}^{{b_i}} {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_{it}}} } \over {\sum\nolimits_{t = 1}^N {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_{it}}} }} \ge 1 - {{{b_i}} \over N}} \cr {{{\sum\nolimits_{t = 1}^{{b_i} - 1} {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_{it}}} } \over {\sum\nolimits_{t = 1}^N {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_{it}}} }} < 1 - {{{b_i} - 1} \over N}} \cr } } \right.} \cr }
Definition	$\begin{matrix} X I_{j}^{B} = \frac{a_{j}}{N} \\ X I^{B} = {(X I_{j}^{B})}_{N \times 1} \end{matrix}$ \matrix{ {XI_j^B = {{{a_j}} \over N}} \cr {X{I^B} = {{\left( {XI_j^B} \right)}_{N \times 1}}} \cr }	$\begin{matrix} X I_{i}^{F} = \frac{b_{i}}{N} \\ X I^{F} = {(X I_{i}^{F})}_{N \times 1} \end{matrix}$ \matrix{ {XI_i^F = {{{b_i}} \over N}} \cr {X{I^F} = {{\left( {XI_i^F} \right)}_{N \times 1}}} \cr }
Pruning	${\overset{\leftarrow}{w}}_{ij} = {\begin{matrix} w_{ij} & , & w_{ij} = {\overset{\leftarrow}{w}}_{sj} and s \leq a_{j} \\ 0 & , & otherwise \end{matrix}$ {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftarrow$}}\over w} _{ij}} = \left\{ {\matrix{ {{w_{ij}}} & , & {{w_{ij}} = {{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftharpoonup$}}\over w} }_{sj}}\,and\,s\, \le {a_j}} \cr 0 & , & {otherwise} \cr } } \right.	${\vec{w}}_{ij} = {\begin{matrix} w_{ij} & , & w_{ij} = {\overset{⇀}{w}}_{it} and t \leq b_{i} \\ 0 & , & otherwise \end{matrix}$ {\vec w_{ij}} = \left\{ {\matrix{ {{w_{ij}}} & , & {{w_{ij}} = {{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}}\over w} }_{it}}\,and\,t\, \le {b_i}} \cr 0 & , & {otherwise} \cr } } \right.
Merging	${\overset{\leftrightarrow}{w}}_{ij} = {\begin{matrix} w_{ij} & , 0 & {\overset{\leftarrow}{w}}_{ij} \neq 0 or {\vec{w}}_{it} \neq \\ 0 & , & otherwise \end{matrix}$ {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftrightarrow$}}\over w} _{ij}} = \left\{ {\matrix{ {{w_{ij}}} & {,0} & {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftarrow$}}\over w} }_{ij}} \ne 0\,or\,{{\vec w}_{it}}\, \ne } \cr 0 & , & {otherwise} \cr } } \right.
Result	$\overset{\leftrightarrow}{W} = {({\overset{\leftrightarrow}{w}}_{ij})}_{N \times N}$ \mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftrightarrow$}}\over W} = {\left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\leftrightarrow$}}\over w} }_{ij}}} \right)_{N \times N}}

Note: “descend ()” means to arrange the vectors in descending order.

Table 2

The basic information of ICIO databases.

Database	Version	Time Span	Country / Region	Industrial Sector	Abbr
WIOD	2016 Release	2000–2014	44	56	WIOD2016
WIOD	2013 Release	1995–2011	41	35	WIOD2013
OECD-TiVA	2020 Release	1995–2018	Unknow	Unknow	TiVA2020
	2018 Release	2005–2015	65	36	TiVA2018
	2016 Release	1995–2011	64	34	TiVA2016
	2015 Release	1995, 2000, 2005, 2008–2011	62	34	TiVA2015
Eora-MRIO	V199.82	1990–2015	189	26	Eora26
ADB-MRIO	Updated to 2019	2000, 2007–2019	63	35	ADB2019

Note: In addition to sovereign states, the Rest of World (RoW) is taken as an independent economic entity in WIOD, OECD and ADB, most of which belong to the developing countries; In TiVA2018, for “intermediates”, “value added” and “output”, data for Mexico and China are split into MX1, MX2 and CN1, CN2, respectively.

This algorithm serves a dual purpose. One is to retain the key outward connections from the perspective of provider, and the other is to retain the key inward connections from the perspective of consumer. It should be noted that if all the key export-oriented links cannot touch the industrial sectors of some weak countries as consumers, then these sectors will be separated from the GVC, which undermines the integrity of GVC. Therefore, we carry out the merge after network pruning from the perspective of out- degree and in-degree respectively. However, according to the nature of this algorithm, this pruning method is still based on the relations of adjacent nodes, rather than global information, so it is still possible to ignore the “critical” and “weak” edges.

3.3

Mixed strategy

In consideration of balancing the overall and local, the preferable algorithm in theory for extracting the backbone of GVC with enough structural and functional information has three steps.

Step 1. Find the edges acting as crucial bridges between nodes or communities based on the RFWA.

Step 2. Find the edges retaining the great majority of weights based on the XIFA.

Step 3. Merge the edges of Step 1 and 2 and add their adjacent nodes.

To illustrate this mixed strategy, Figure 2 (a) shows a five-node directed and weighted network with self-loops, in which the larger the edge weight, the closer the asymmetric interactions. When conducting the SP method, we search globally for key chains seeing both the efficiency and effectiveness. An edge will be preserved if it happens to be located on the most direct path between two nodes and the strongest relevance path as well (see Figure 2 (b)). The FE method, from the perspective of local information, retains the pivotal links that play an important role in the adjacent edges of each node, by examining them in two ways according to the XIFA (see Figure 2 (c)). In this case, the mixed strategy, denoted by SPFE, increases another two edges on the basis of the FE-based backbone (see Figure 2 (d)).

Getting the SPFE from an original network.

Data and modeling

With the availability and utilization of global IO database, especially Inter-Country Input-Output (ICIO) tables, relevant research is no longer limited to the economic system of a single country, but extends to inter-regional and even global economic systems. Typical IO or ICIO table includes three different areas, namely value-added (VA), intermediate use (Z) and final demand (Y). It is possible that the whole global economic system can be abstracted to a Multi-Layer Network as shown in Figure 3 (b), which includes three layers: the value-added layer, the intermediate use layer, and the final demand layer. Commonly, the GVC-related researches are concerned with the global production system, which is well profiled by the intermediate use part of ICIO table as shown in Figure 3 (a).

To embody the non-linear relations between worldwide industrial sectors, we adopt the intermediate use part of ICIO table as the basis of network modeling. In this ICIO network, a sector within a country or region is considered as a node and the inter-industry IO relation as an edge, and its edge weight represents the sale and purchase relations between producers and consumers, quantified by cash flux. Thus, a graph G = (V, E, W) containing n nodes is drawn and denotes a node set V. Pairs of nodes are linked by edges to reflect their interdependencies, thereby forming an asymmetric edge set E. In this valued graph, the function of set E can be substituted by that of weight set W, which can be directly extracted from the region of inter-country and inter-industry use and supply in ICIO table. Finally, we name this directed and weighted network the Global Industrial Value Chain Network (GIVCN) model as shown in Figure 3 (c).

Relations between ICIO table, multi-layer and single-layer ICIO network.

By the comparison of the main ICIO databases available for now, the year of 2014 as the latest and the greatest common divisor exists in World Input-Output Database (WIOD), OECD-WTO Database on Trade in Value-Added (OECD-TiVA), Eora Multi-Region Input-Output Table Database (Eora-MRIO) and Asian Development Bank Multi- Regional Input-Output Tables (ADB-MRIO), so we take them as the basis of modeling and empirical analysis. Two of them, WIOD and TiVA, provide more granular sectoral information and are more rooted in official statistics, and are widely used to exploit the analytical richness of trade-focused empirical analysis. Although Eora covers 189 economies, the sectoral data provided is highly aggregated. ADB, as an extension to WIOD, facilitates the production and analysis of GVC-related statistics for more Asian economies and becomes a rich source of economic information for research and policymaking.

To facilitate the calculation and visualization in empirical analysis, sectors are further merged into four categories, including Agriculture (SC1), Mining (SC2), Manufacturing (SC2) and Services (SC4). The four GIVCN models based on the ICIO databases of the latest version are hence named “GIVCN-WIOD-SC4-2014”, “GIVCN-TiVA-SC4-2014”, “GIVCN-Eora-SC4-2014” and “GIVCN-ADB-SC4-2014” respectively, and statistics for the structural properties of full networks includes the number of edges |E|, average distance 〈d〉, mean node degree 〈K〉 and clustering coefficient 〈C〉, which are shown in Table 3.

Table 3

Illustration of properties of the original networks.

Networks	\|E\|	〈d〉	〈K〉	〈C〉
GIVCN-WIOD-SC4-2014	30,795 (30,976)	1.006 (1.000)	174.972 (176.000)	1.000 (1.000)
GIVCN-TiVA-SC4-2014	66,600 (67,600)	1.007 (1.000)	257.143 (260.000)	0.996 (1.000)
GIVCN-Eora-SC4-2014	571,536 (571,536)	1.000 (1.000)	756.000 (756.000)	1.000 (1.000)
GIVCN-ADB-SC4-2014	57,761 (63,504)	1.069 (1.000)	231.972 (252.000)	0.971 (1.000)

If the information of edge weight is not considered, the GIVCN model has a high density of connections, which may give it access to the shorter average path and the greater clustering coefficient (see Table 3). Therefore, both of them are strong evidence that this model is a small-world network. If the weight is considered, the heavy tails in Figure 4 reveal that the distributions of edge weight are extremely heterogeneous. Most IO relations carry on very little value flow, while some pass massive intermediate goods on the GVC. This phenomenon reflects the GIVCN model is a sort of scale-free network in nature. In pruning, the above two points require us to consider both the removal of redundancy and the retention of weights.

Edge weight distribution of ICIO networks under the double logarithmic coordinate.

Results and discussions

5.1

Preservation of structural information

Three backbone extraction methods are applied to the “GIVCN-” models together with the simplest GT method and the classical DF method. Despite the subnetworks still look a bit like a ball of yarn, it is possible to analyze the heterogeneous characteristics of global production system based on the simplified topological structure (see Appendix). Given that E_SP and E_FE are the edge sets of GIVCN-SP-2014 model and GIVCN-FE-2014 model, respectively, that of the whole one will be E_SPFE = E_FE ∪ E_SP. We use three sorts of colored lines to differentiate which model the links come from, and Figure 5 shows that GIVCN-SP-2014 is completely contained in GIVCN-FE-2014 in most cases. As an exception, GIVCN-Eora-SC4-2014 has six edges that only belong to the sub-network based on SP method, because the size of the model increases the heterogeneity of edge weight. After all, there are always some puny industry sectors or industrial linkages that exist independently, but it is really difficult to capture them from the perspective of weight.

MDS of GIVCN-SPFE-2014 models. Note: E_G denoted by the green line is the intersection of E_SP and E_FE EFE, i.e., E_G = E_FE ∩ E_SP; E_B denoted by the blue line is the edge set only belonging to E_FE but not E_SP, i.e., E_B = E_FE – E_G; E_R denoted by the red line, contrary to E_G, is the edge set only belonging to E_SP but not E_FE, i.e., E_R = E_SP – E_G.

Take the GIVCN-Eora-SC4-2014 model as an example, in the GT method, the minimum threshold is one ten-thousandth of the maximum edge weight, which is equal to 9.728E+05, and the second smallest one is 1.946E+06, and so on. Then, each threshold is selected in order from small to large to conduct network pruning. In other words, the parameters in the GT method can be regarded as specific removal rules. DF method computes the probability (p_{ij= (1−w_ij/s_i)^k_i⁻¹} that such a value is compatible with the null model: if p_ij < α, where α is a parameter representing the desired significance level, the edge is preserved, otherwise it is removed. Lower values of α lead to sparser networks, as fewer edges are preserved. Although an edge is connected to two nodes, its direction determines there exists only one value for p_ij, that is, based on the out-degree and out-strength of the source node.

Table 4 reports the structural similarity of different backbone networks in terms of the percentage of non-isolated nodes (N_t%) and their Quadratic Assignment Procedure (QAP) correlation with the original network, where the QAP is an often-used approach to measure the extent to which two networks are correlated or have a similar pattern of connections (Dai, Derudder, & Liu, 2018). According to the results, the QAP correlations of the five algorithms (all P value < 0.001) are extremely high, that is, they can all maintain a high similarity with the original network in the overall structure level. Except for GT, other methods can better protect the bridges in the network and hardly make nodes isolated.

Table 4

Structural similarity of various backbones and original network.

Networks	N_t% no self loops					QAP

	GT	DF	SP	FE	SPFE	GT	DF	SP	FE	SPFE
GIVCN-WIOD-SC4-2014	88.068	100.000	100.000	100.000	100.000	1.000	0.998	0.994	1.000	1.000
GIVCN-TiVA-SC4-2014	91.539	99.615	99.615	99.615	99.615	1.000	0.991	0.996	1.000	1.000
GIVCN-Eora-SC4-2014	65.079	100.000	100.000	100.000	100.000	1.000	1.000	0.999	1.000	1.000
GIVCN-ADB-SC4-2014	77.778	98.413	98.810	98.810	98.810	1.000	0.998	0.996	1.000	1.000

5.2

Preservation of functional information

The number of residual edges is recorded in the new network and the percentage of the residual edge weights for the original network is shown in Figure 6. Take the GIVCN-Eora-SC4-2014 model as an example, in the network extracted at the global threshold of 9.728E+05 (one ten-thousandth of the maximum edge weight), the number of edges quickly decreases from 571,536 to 2,779 and the decline rate reaches 99.510%, while the total edge weight of the network decreases by only 3.370%. Obviously, setting thresholds is acceptable if following the principle of “significantly reducing the number of edges while retaining most of the total edge weight”. The edge filtering procedure retains more edges than setting global thresholds, and the proportion of residual edge- weights of the former with α ≥ 0.3 or α ≤ 0.1 is higher or lower than the latter. The proportion of residual edge weights is only 87.758% while simplifying the GIVCN model by the SP. However, with the addition of FE method, GIVCN-SPFE can retain the weight to the maximum extent, almost reaching 99.154%.

Comparative result of network pruning of GIVCN models. Note: Since the pruning results of SPFE and FE method are almost the same in our models, the FE method is not shown in the figure.

Moreover, we propose the Average Edge Weight (AEW) to measure the weight information included by unit edge: (4) $AEW = \frac{\sum_{(i, j) \in E} w_{(i, j)}}{| E |}$ AEW = {{\sum\nolimits_{\left( {i,j} \right) \in E} {{w_{\left( {i,j} \right)}}} } \over {\left| E \right|}} where |E| is the number of edges in E and w_(i,j) is the weight of edge (i, j). Moreover, given that AEW_np represents the average edge weight of the backbones which extracted by the pruning methods and AEW_rand is the result obtained in a null model where the edges with the same proportion as skeleton networks are randomly removed, then the ratio of them, AEW_ratio = AEW_np/AEW_rand, is of great significance to evaluate a specific method's efficiency. Deviations of AEW_ratio above or below 1 demonstrate positive or negative pruning results, respectively.

A thousand times comparisons between each method and null model are shown by boxplot graph in Figure 7. Although all of them precede their null models, the SP method as the best and the most consistent one manifests it catches a small number of critical paths but with extremely high edge weights. In conclusion, among the two parameterless methods, the global SP method can preserve the structural integrity of the network well, but the entire robustness and local function of network are inevitably influenced since it yields so few links. FE method can basically embrace SP and supplement more weight information, and thus has more significance in extracting the backbone of ICIO network.

The ratio of the average edge weight of the pruning methods and the null model.

Conclusions

Modeling and analyzing complex systems based on Complex Network theory has become a research hotspot. With the advent of the era of big data, the rapid expansion of network scale is becoming a new challenge for network science research. Backbone extraction is an adequate tool for studying large network. It is used to detect significant edges and obtain a sparser network with only the crucial information. Different from the existing methods, both the heterogeneity of weights and denseness of ICIO network are considered in this paper. We strive to propose solutions that do not rely on artificial parameters and find a better one between the global and local methods.

The SP algorithm measures the strength of relations across or within sectors from the global perspective of IVC, retaining the links with closer IO relations and lower transaction costs. By combining H-index and Pareto rule, the FE algorithm performs the filtering process at the node-local level and converges naturally without depending on parameters. Through comparison between the two and existing methods, it is proved that the SP algorithm is already an efficient and effective tool, which can not only keep the structural similarity with the original network on the global level but also ensure that retained unit edge carries enough information. However, high-efficiency redundancy removal makes it tortuous to reach some marginal nodes inevitably. The FE algorithm covers the results of SP, while retaining higher weight on the premise of the number of remaining edges is close to other methods. Therefore, the FE algorithm can basically replace the mixed strategy and become the better of the two proposed methods.

In addition, the two special network pruning algorithms are useful to explore more possibilities for the study of highly dense and heterogeneously weighted networks, such as information networks, communication networks, citation networks, social media networks, etc. Also, except for visualization and network topology analysis, the obtained sparse network backbone will promote the relevant analyses in terms of community detection, link prediction, spatial econometrics, network embedding edge2eve, etc.

eISSN:: 2543-683X
Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

RSS Feed de revista

Parameterless Pruning Algorithms for Similarity-Weight Network and Its Application in Extracting the Backbone of Global Value Chain

Article Category: Research Paper

Publicado en línea: 11 dic 2021

Páginas: 57 - 75

Recibido: 11 jul 2021

Aceptado: 11 oct 2021

DOI: https://doi.org/10.2478/jdis-2022-0002

Palabras claveInter-Country Input-Output Table, Global Production System, Global Value Chain Network, Backbone Extraction

© 2022 Lizhi Xing et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Palabras clave
Inter-Country Input-Output Table, Global Production System, Global Value Chain Network, Backbone Extraction