Extended Lorenz majorization and frequencies of distances in an undirected network

Findings: We show that the distance distribution in an undirected network Lorenz majorizes the one of a chain. As a consequence, the average and median distances in any such network are smaller than or equal to those of a chain. Research limitations: We restricted our investigations to undirected, unweighted networks. Practical implications: We are convinced that these results are useful in the study of small worlds and the so-called six degrees of separation property.


Introduction
Let G=(V,E) be an undirected network, where  = (  ) =1,…, denotes the set of nodes or vertices and E denotes the set of links or edges.As collaboration (of scientists, universities, countries), bibliographic coupling (of articles, books), and co-citation (of articles, books) are all examples of undirected networks, it goes without saying that the study of these networks is of great importance for bibliometrics (Rousseau et al., 2018).
We assume that #V = N > 1.A chain in a network is a sequence of different nodes one by one connected by edges.The distance d between two nodes is equal to the number of links situated on a shortest chain (often called a shortest path) between these two nodes.Consequently, the distance between two nodes connected by an edge is equal to one.
Each network studied in this article is assumed to be connected, i.e. there is a chain between any two nodes.Hence, for each node, there exists another node at a distance one.The total number of distances between any pair of nodes in this network is equal to N(N-1)/2, where, for v 1 ,v 2 ∈ V, d(v 1 ,v 2 )=d(v 2 ,v 1 ) is considered only once.
Notation.We denote by   , j= 1,…, N-1, the number of times distance j occurs in network G.The array  = ( 1 ,  2 , … ,  −1 ) is called the array of the network G.

Some immediate properties
We give a short proof of 5)  2 ≤ , which is a contradiction.

6)
In a chain of length j there are 2 chains of length j-1, 3 chains of length j-2, and k chains of length j-k+1 (0 < k< j).
7) The distance frequency array of a complete N-node network, K N , is We recall the definition of the majorization order (Hardy et al., 1934).Let X = �  � and Y= �  � , j=1,…, N-1 be two (N-1)-sequences of nonnegative numbers, ordered decreasingly then X majorizes Y, denoted as and We recall that if X ⋟ Y then the (standard) Lorenz curve of X (Lorenz, 1905) is situated above the Lorenz curve of Y.We now extend the notions of majorization and Lorenz curve by removing the requirement to be arranged in decreasing order.

Definition. Extended majorization order
Let X = �  � and Y= �  �, j=1,…, N-1 be two (N-1)-sequences of nonnegative numbers, then X majorizes Y (in the extended sense), denoted as X ⋟ Y (we keep the same notation), if Definition.Extended Lorenz curve Let X = �  � be an (N-1)-sequence of non-negative numbers and let be the j th partial sum.Hence s N-1 = TOT denotes the total sum of the numbers in X; s 0 is set equal to 0. Now plot the points and connect them by line segments to obtain a curve joining the origin (0,0) with the point (1,1).We refer to this curve as the extended Lorenz curve.Contrary to the standard Lorenz curve this curve is not necessarily concave (but of course still increasing).An example is shown in Fig. 3.If X is increasing then the extended Lorenz curve coincides with the classical Lorenz curve.If X ⋟ Y then the extended Lorenz curve of X is situated above the extended Lorenz curve of Y.

The main result
Given the number of nodes, N, we next show a majorization result between the frequency sequence of a complete network K N , that of a general network G=(V, E), denoted as A, and the frequency sequence C of a chain.

Theorem
Given a network G=(V,E) with N nodes, then, By ( 8) and ( 9) the second inequality means that We moreover prove that Proof.By (1), we already know that which proves (11).Assume now that   = 0, for i > 1, then we already know that ∀  ≥ :   = 0 and thus Assume now that for some i > 1,   ≠ 0, then we will prove that also in . This will be done in several steps.First, we show that ∑   ≥  =1 (+1)

2
. Indeed: there exists in V at least one chain of length i, connecting nodes to which we refer as u 1 , u 2 , …, u i+1 .
Then, by property 6, we know that ∀  = 1, … , :   ≥  −  +1, and hence In the next step, we show that this inequality can be refined to 2 � + .Indeed, as the network under study is connected, there exists a node in the network, denoted as u i+2 , connected to at least one node of the chain u 1 , u 2 , …, u i+1 .This point u i+2 has a distance d, 0 < d ≤ i to at least i points in the chain.Now, adding the i distances involving the point u i+2 we obtain ∑   ≥  =1 � (+1) 2 � + .We further note that ∀  ≤ , each point in the set S = { u 1 , u 2 , …, u i+2 } has at least j points in the set S at a distance 0 < d ≤ j. which proves the inequality in the case   ≠ 0, and hence (10).Now we prove (12).Using ( 10) and (11) we have: where we still have to prove the final inequality in (12).For this, we first observe that:

Remarks and consequences
1) It follows immediately from the previous theorem that for a given number N and  −array A = ( 1 ,  2 , … ,  −1 ) the median of A is smaller than or equal to the median of the chain of length N-1 (N nodes).

5) If
A is an array of length N-1, consisting of non-negative natural numbers such that then the components of A do not have to be frequencies of distances in a network.Indeed, let N = 4 and let A = (4,1,1), then (4,1,1) ⋟  = (3,2,1).
Yet, there does not exist a network with (4,1,1) as distance frequencies: the third component is equal to one indicating that the network must be a chain but for a chain with 4 nodes,  1 = 3 and not 4.
Even if the last component of A is zero a counterexample is possible.
The fifth node must be connected to the second or the third node in the chain.Hence A must necessarily be (4,4,2,0) and cannot be (4,3,3,0).
These examples lead to the open question of finding the conditions under which such an array A is the frequency array of the distances in a (connected) network.
6) From the above and the main theorem we see that max {Md; Md is the median distance in an N-node network} is strictly smaller than max {  ̅ :  ̅ is the average distance of an N-node network}.Although Md ≤  ̅ is not always true: a star with a center and N-1 rays (N>4) is an example (Md = 2 and  ̅ = 2(−1)  < 2 ), we have that if the  -sequence of a network is decreasing then clearly Md ≤  ̅ .The reverse of this result does not hold in general.This is illustrated by G 0 (N=7) in Fig. 2 below.Its -sequence is not decreasing, namely (6,7,6,2,0,0) but yet Md = 2 <  ̅ = 46/21 ≈ 2.19.denotes the average distance between nodes in this network, say  ̅ .

Conclusion
In this article, we introduced the study of the distance distribution of a network.We showed that the distance distribution in an undirected network majorizes the one of a chain and is always smaller (in the sense of majorization) than the distribution of the corresponding complete Nnetwork.The Gini coefficient respects the majorization order for such distributions, while the average distance behaves oppositely.As a consequence, the average and median distances in any such network are smaller than those of a chain.
We intend to use these results in the study of small worlds and the socalled six degrees of separation property (work in preparation).
Acknowledgement.The author thanks his colleague Ronald Rousseau for useful discussions. References.
Now we continue in this way.Assuming that we have a set T of i+n connected nodes {u 1 , …, u i+1 , u i+2 , …, u i+n+1 } from which we already derived that ∑   () and for which we know that ∀  ≤  each point in the set T, has at least j points in the set T at a distance 0 < d ≤ j.We again apply connectedness to get a new node u i+n+2 at a distance d, 0 < d ≤ i to all points in T, leading to ∑   1).Again we observe that ∀  ≤ , each point in T* = {u 1 , …, u i+1 , u i+2 , …, u i+n+2 } has a distance d, 0 < d ≤ j, with at least j points in the set T*.This procedure ends with n = N-i-2 for which ∑ . Such a network may look like shown in Fig. 1.