Open Access

SPAG: A New Measure of Spatial Agglomeration. Theoretical Background and Empirical Examples1


Cite

Introduction

Arbia et al. (2015) presented a number of statistical approaches to the study of the spatial concentration and dispersion of economic activities. They noticed that in the analysis of the problem of spatial location, the following economic activities are used: “a mesoeconomic approach (looking at the distribution of the agents within geographical partitions such as administrative units, regions) or an approach based on the individual geo-localizations of firms”.

The measurement of spatial concentration or specialisation has long history and many papers addressed this problem. Generally, we have two types of measures for spatial agglomeration, concentration or specialisation: 1) cluster-based measures, and 2) distance-based measures. Among them, cluster-based measures seem to be more popular. Interesting review was delivered in a recently published book Measuring regional specialisation. A new approach (2017). Measures of the first type usually divide territories into finite number of regions and use aggregated data describing an economic activity (for example Gini or Location Quotient, see Marcon and Puech 2010). These measures mainly have a the Modifiable Areal Unit Problem (Arbia 2001a, Morphet 1997) because they are fragile on the shape or size of spatial units. A few papers suggest improving cluster-based indices with the use of a spatial weights matrix W (Arbia 2001b; Arbia, Piras 2009; Guillain, Le Gallo 2010; Sohn 2014) This matrix represents a spatial structure of regions and adds information about possible spatial autocorrelation within regions.

Ripley (1976) proposed a descriptive statistics for detecting deviations from spatial homogeneity in the process. He introduced so-called K function as an integral of g-function in respect to the distance r: K(r)=2π0rg(ρ)dp.K(r) = 2\pi \int\limits_0^r {g(\rho )dp} .

Ripley (1976) also delivered a rigorous foundation for the second-order analysis of stationary point processes on general spaces and introduced K function. The modelling of spatial patterns was continued by Ripley (1977) with the term ‘model’ understood as the distribution of a simple second-order point process strictly stationary under a rigid motion. For point processes in the plane, K function is a measure of the distribution of the inter-point distances.

The distance-based measures (Marcon, Puech 2003, 2010; Duranton, Overman 2005, 2008) use real locations of firms in geographical space, basing on the concept of K Ripley’s function and its many modifications (Baddeley 2000; Penttinen et al. 1992; Penttinen 2006). These measures also omitted the MAUP, but their results are presented in an inconvenient form as a chart of the K function. Kopczewska (2017) proposed a new empirical measure of spatial agglomeration (SPAG) of economic activity based on the geolocation of firms. This measure is a distance-based measure and corresponds to a geometric model of spatial agglomeration (Marcon, Puech 2003, 2010; Duranton, Overman 2005, 2008; Arbia et al. 2010; Mori, Smith 2014).

The aim of our paper is to introduce theoretical backgrounds of SPAG because this issue is not present in the original work by Kopczewska (2017). In this paper, we provide the background for this empirical indicator, using primarily the concept of the spatial process. Then, we assume that SPAG can be considered as a product of two random variables possessing beta and gamma distribution. The moments of the product are described and estimated for Poland with spatial centroids of LAU2 treated as geo-localisations of firms for reference distribution. In this paper we also follow the idea of geometrical probability. With the reference to some older and more contemporary papers related to this approach, we present results obtained for a regular region analysis.

In the second section of the paper we construct the SPAG index from a theoretical point of view, while Section 3 analyses statistical properties of the index, both theoretical and empirical. Section 4 presents a geometrical probability approach to SPAG and is followed by the conclusion.

Construction of the index

The theoretical construction of SPAG presents an approach, which includes the density measurement of economic activity in a given area. Similarly to the other (known) distance-based measures, it starts with individually geolocated firms. Therefore, it might be applicable considering territorial divisions and the problem of zoning (MAUP) is naturally omitted. The idea differs from the concepts of Ripley’s K function or kernel estimation (Ripley 1977). Instead of using the function, we propose the index of spatial agglomeration (SPAG) based on the geometrical representation of firms by circles with radii depending on the size of firms.

The index is introduced taking into account the more general framework of spatial random processes. Its construction is as follows:

We define a spatial random process, where a random variable describes some economic activity,

We introduce a parameter κ describing the spatial concentration/agglomeration with respect to five conditions of ‘good measure’

These five conditions were described by Marcon and Puech (2010). The ideal measure of concentration: (i) compares geographic concentration results across industries, (ii) controls industrial concentration, (iii) controls the overall aggregation patterns of industries, (iv) tests the significance of the results and (v) keeps the empirical results unbiased across geographic scales.

,

We formulate SPAG index for spatial concentration/agglomeration as an estimator of parameter κ.

Let us consider a set S = {s: s = (φ,λ)} where φ,λ are coordinates in any two-dimensional coordinate system (i.e. on the plane, on the ellipsoid WGS84, geographical coordinates, UTM coordinates, etc.). Coordinates φ,λ are restricted by some additional conditions in such a way that the set S is limited by some boundary ∂S. Let X(s) will be a random variable associated with any point sS. For further simplifications we restrict random variables X(s) to be independently, identically distributed with a given CDF f. Under the above conditions {X(s): sS} is a spatial random process. We will interpret the points of S as possible places of economic activity.

Let us introduce a metric function d over the space S that fulfills standard (for a measure to be a distance measure) conditions:

d(si, sj) = 0 if and only if i = j,

d(si, sj) = d(sj, si),

d(si, sk)+d(sk, sj) ≥ d(si, sj).

The pair (S, d) creates a metric space. Let us define AS as a field (‘region’) bounded by ∂S. The area of A, denoted by |A|, is equal to SSdφdλ.\int\!\!\!\int_S {\partial S\;d\varphi d\lambda .}

Let us assume that the parameter κ is defined over the random field {X(s): sS}. It is expected that it fulfills five conditions for the good measure of the spatial concentration (Marcon, Puech 2010). The parameter is unobservable and it will be constructed as a selected function of three parameters κ=κ(θc,θd,θo),\kappa = \kappa ({\theta _c},{\theta _d},{\theta _o}), where θc, θo ∈ [0,1] and θd ∈ [0,∞).

We take the function, which is a product as it is the simplest function preserving monotonicity with respect to each of these three parameters κ=θc·θd·θo.\kappa = {\theta _c} \cdot {\theta _d} \cdot {\theta _o}.

The aim of the study is to find a ‘proper’ estimator for κ. For this, we consider a finite sample from the random field {X(s): sS}, written as a finite length (n) random vector Z(s)=(X(s1),X(s2),,X(sn)).{\rm{\bf Z}}(s) = (X({s_1}),X({s_2}), \ldots ,X({s_n})).

Denote its realisation by z(s)=(x(s1),x(s2),,x(sn)),{\rm{\bf z}}(s) = (x({s_1}),x({s_2}), \ldots ,x({s_n})), where x(si) are values of random variables associated with points si = (φii).

In the next step of the construction of the estimator, we convert vector z(s) to vector z(r) with the use a special transformation (let us call it T) related to radii r. In the beginning, we focus on the location of n economic activity units. In the empirical distribution, each point si = (φii) of n economic activity locations is covered by the circle, the area of which is proportional to any feature of the location, for example, employment. The radius i of the i-th circle might be a continuous variable for precise data on employment or discrete for interval data. The sum of the areas |Ai| of n circles is equal to the area |A| of the region. Radii of the circles create the “impact zones of economic activity”, which are bigger in the case of larger firms. Setting circles in real business locations is to reflect upon the phenomena of a spatial agglomeration or other spatial patterns. As a consequence, we obtain vector z(r): z(r)=(r1,r2,,rn),{\rm{\bf z}}(r) = ({r_1},{r_2}, \ldots ,{r_n}), where ri are calculated by ri=x(si)|A|πj=1nx(sj).{r_i} = \sqrt {{{x({s_i})\left| A \right|} \over {\pi \sum\nolimits_{j = 1}^n {x({s_j})} }}.}

For the discrete case (the distribution of radii is discrete) a classification function g:Z(s)→{1,2,...,k}, where k is a number of classes, is defined. Then, for each class i the radius with the use of the following formula ri={j:g(x(sj))=k}x(sj)niπj=1nx(sj),{r_i} = \sqrt {{{\sum\nolimits_{\left\{ {j:g\left( {x\left( {{s_j}} \right)} \right) = k} \right\}} {x({s_j})} } \over {{n_i}\pi \sum\nolimits_{j = 1}^n {x({s_j})} }},} where ni is a number of elements in class i, is calculated.

Assuming that radii are equal for all the places representing a location of economic activity, the following formula is used to calculate the radii riri=|A|nπ.{r_i} = \sqrt {{{\left| A \right|} \over {n\pi }}} .

After the T transformation, a circle with the center in point si and radius ri represents each element of vector z(s). The metric space (S, d) is invariant with respect to the T transformation.

Now, let us introduce SPAG as an estimator for the κ parameter. It is defined as SPAG=θcθd^θo.SPAG = {\theta _c}\widehat { \cdot {\theta _d} \cdot }{\theta _o}.

We assume that θcθd^θoθc^θd^θo^,{\theta _c}\widehat { \cdot {\theta _d} \cdot }{\theta _o} \approx \widehat {{\theta _c}}\widehat {{\theta _d}}\widehat {{\theta _o}}, which gives us the possibility to estimate each factor in the product separately.

The first factor of the above product is an estimator of parameter θc. For its specification, we use subvector z*(r) = (r1*, r2*, ..., rl*) of the vector z(r) = (r1, r2, ... , rn), where ln. Then, the estimator of θc is defined by θc^=πi=1l(ri*)2|A|.\widehat {{\theta _c}} = {{\pi \cdot {{\sum\nolimits_{i = 1}^l {\left( {r_i^*} \right)} }^2}} \over {\left| A \right|}}.

The range of the values of the estimator is the interval [0,1].

For the specification of the estimator θd a metric (distance) function d over the space S will be given. Then, the estimator of the index exceeding parameter θd is as follows θd^=i=1lj=1ld(i,j)l2d¯,\widehat {{\theta _d}} = {{\sum\nolimits_{i = 1}^l {\sum\nolimits_{j = 1}^l {d(i,j)} } } \over {{l^2} \cdot \overline d }}, where d¯{\bar d} is a mean distance for the uniform spatial distribution of l locations of economic activities. The values of this estimator belong to the interval [0,∞).

Finally, the estimator of parameter θo is specified by the formula θo^=|i=1lAi|πi=1l(ri*)2,\widehat {{\theta _o}} = {{\left| {\bigcup\nolimits_{i = 1}^l {{A_i}} } \right|} \over {\pi \cdot \sum\nolimits_{i = 1}^l {{{\left( {r_i^*} \right)}^2}} }}, where Ai is a circle with the centre at point si = (φii) and radius ri. The range of values of the estimator θo^\widehat {{\theta _o}} is interval [0,1].

Following Kopczewska (2017), we will denote estimators θc^\widehat {{\theta _c}} , θd^\widehat {{\theta _d}} , θo^\widehat {{\theta _o}} as indices Ic, Id, Io respectively. We will call these indices: an index of coverage, an index of distance and, an index of overlap. All components have their economic interpretation.

According to Kopczewska (2017), the SPAG index as a measure of the degree of spatial agglomeration allows: 1) comparing the regions or sectors over time, 2) comparing sectors inside a region and between regions. The SPAG index uses combination of information on the area, size and sectors of companies. Thus, the application of SPAG by policy makers, as well as the comparability between regions and/or sectors over time is easy and powerful. SPAG has the range values in interval [0, M], where M=SPAGmax,M = {SPAG_{max }}, the maximum value of SPAG estimated, with some small bias, as a ratio SPAGmax=dmaxE(d),{SPAG_{max }} = {{{d_{max }}} \over {{\rm{\bf E}}(d)}}, with dmax being a maximum distance within the region, and E(d) being the expected value of the distribution of distances within a given region.

The bias of SPAGmax is related to the fact, that E(d) is slightly higher than the mean of distances between the locations of firms in a referenced spatial distribution, and in consequence, is usually underestimated SPAGmax. Values of the index exceeding 3 were not reported in empirical studies.

Statistical properties of SPAG

In this section, we briefly discuss some of the statistical properties of SPAG. For this purpose we will introduce some results reported in a paper by Nadarajah and Kotz (2005), and then, we will show how these results can be used in the SPAG case.

Beta and Gamma distribution case

Let us observe that SPAG is in fact a product of two independent following factors SPAG=|i=1lAi||A|θd^.SPAG = {{\left| {\bigcup\nolimits_{i = 1}^l {{A_i}} } \right|} \over {\left| A \right|}}\widehat {{\theta _d}}.

For further simplifications, we will denote θco^=|i=1lAi||A|.\widehat {{\theta _{co}}} = {{\left| {\bigcup\nolimits_{i = 1}^l {{A_i}} } \right|} \over {\left| A \right|}}.

Let’s assume, that θco^\widehat {{\theta _{co}}} has beta distribution with the probability density function fθco^(θco^)=θco^a1(1θco^)b1B(a,b),{f_{\widehat {{\theta _{co}}}}}(\widehat {{\theta _{co}}}) = {{{{\widehat {{\theta _{co}}}}^{a - 1}}{{\left( {1 - \widehat {{\theta _{co}}}} \right)}^{b - 1}}} \over {B(a,b)}}, for θco^(0,1)\widehat {{\theta _{co}}} \in (0,1) , a > 0, b > 0. The expected value and variance of θco^\widehat {{\theta _{co}}} are E(θco^)=aa+b,σθco^2=ab(a+b)2(a+b+1).{\rm{\bf E}}(\widehat {{\theta _{co}}}) = {a \over {a + b}},\sigma _{\widehat {{\theta _{co}}}}^2 = {{ab} \over {{{\left( {a + b} \right)}^2}\left( {a + b + 1} \right)}}.

For θd^\widehat {{\theta _d}} let us assume that it has gamma distribution with the probability density function fθd^(θd^)=λβθd^β1eλθd^Γ(β),{f_{\widehat {{\theta _d}}}}(\widehat {{\theta _d}}) = {{{\lambda ^\beta }{{\widehat {{\theta _d}}}^{\beta - 1}}{e^{ - \lambda \widehat {{\theta _d}}}}} \over {\Gamma \left( \beta \right)}}, for θd^>0\widehat {{\theta _d}} > 0 , β > 0, λ > 0. The expected value and variance of θd^\widehat {{\theta _d}} are E(θd^)=βλ,σθd^2=βλ2.{\rm{\bf E}}(\widehat {{\theta _d}}) = {\beta \over \lambda },\sigma _{\widehat {{\theta _d}}}^2 = {\beta \over {{\lambda ^2}}}.

Following the paper by Nadarajah and Kotz (2005: 437), the probability density function of SPAG=θco^θd^SPAG = \widehat {{\theta _{co}}} \cdot \widehat {{\theta _d}} is fSPAG(SPAG)=λβΓ(b)Γ(β)B(a,b)SPAGβ1eλSPAGΨ(b,1+βa;λSPAG),{f_{SPAG}}(SPAG) = {{{\lambda ^\beta }\Gamma (b)} \over {\Gamma (\beta )B(a,b)}} \cdot {SPAG^{\beta - 1}}{e^{ - \lambda \cdot SPAG}} \cdot \Psi (b,1 + \beta - a;\lambda \cdot SPAG), for SPAG > 0, where Ψ is the Kummer function defined by Ψ(a,b;x)=1Γ(a)0exttat(1+t)ba1dt.\Psi (a,b;x) = {1 \over {\Gamma (a)}}\int\limits_0^\infty {{e^{ - xt}}{t^{a - t}}{{\left( {1 + t} \right)}^{b - a - 1}}dt} .

The above specification of the distribution allows calculating moments of SPAG. Its expected value is E(SPAG)=aβ(a+b)λ,{\rm{\bf E}}\left( {SPAG} \right) = {{a\beta } \over {(a + b)\lambda }}, and variance has a form of σ2(SPAG)=σθco^2σθd^2+σθco^2E(θd2^)+σθd^2E(θco2^).{{\bf {\sigma}} ^2}(SPAG) = \sigma _{\widehat {{\theta _{co}}}}^2\sigma _{\widehat {{\theta _d}}}^2+\sigma _{\widehat {{\theta _{co}}}}^2{\rm{\bf E}}(\widehat {\theta _d^2}) + \sigma _{\widehat {{\theta _d}}}^2{\rm{\bf E}}(\widehat {\theta _{co}^2}).

Taking into consideration that E(θco2^)=a(a+1)(a+b)(a+b+1),andE(θd2^)=β(β+1)λ2,{\bf E}(\widehat {{\theta _{co}^2}}) = {{a(a + 1)} \over {(a + b)(a + b + 1)}},{\rm{and}}\;{\rm{{\bf E}(}}\widehat {\theta _d^2}{\rm{) = }}{{\beta (\beta + 1)} \over {{\lambda ^2}}}, after some calculations we obtain σ2(SPAG)=βa(3b+βb+a2+ab+a)(a+b)2(a+b+1)λ2.{{\bf \sigma} ^2}(SPAG) = {{\beta a(3b + \beta b + {a^2} + ab + a)} \over {{{\left( {a + b} \right)}^2}\left( {a + b + 1} \right){\lambda ^2}}}.

Using formulas from the paper by Nadarajah and Kotz (2005), percentage points zp associated with cdf function of SPAG are available. These points are obtainable from the following equation (using numerical procedure) 1Γ(β)B(a,b)[(λzp)ββB(b,aβ)H(1ab+β,β;β+1,1a+β;λzp)+(λzp)aaΓ(βa)H(a,1b;aβ+1,1+a;λzp)]=p,{1 \over {\Gamma \left( \beta \right)B\left( {a,b} \right)}}\left[ {{{{{\left( {\lambda { z_p}} \right)}^\beta }} \over \beta }B\left( {b,a - \beta } \right)H\left( {1 - a - b + \beta ,\beta ;\beta + 1,1 - a + \beta ; - \lambda { z_p}} \right) + {{{{\left( {\lambda { z_p}} \right)}^a}} \over a}\Gamma \left( {\beta - a} \right)H\left( {a,1 - b;a - \beta + 1,1 + a; - \lambda { z_p}} \right)} \right] = p, where H() is a hypergeometric function.

Empirical case – the distribution of SPAG for Poland

In this section we analyse a specific empirical case – looking for the distribution of SPAG for Poland with the use of formulas (4) and (5). In this case, we have to find parameters of distributions formulated in (2) and (3). For the beta distribution, we will use the following estimators for parameters: a^=μ^b^1μ^,b^=(1μ)2μ^(1μ^μ^2)2σ^2+μ^1,\matrix{{\hat a = {{\hat \mu \hat b} \over {1 - \hat \mu }},} \cr {\hat b = {{{{\left( {1 - \mu } \right)}^2}\hat \mu } \over {{{\left( {1 - \hat \mu - {{\hat \mu }^2}} \right)}^2}{{\hat \sigma }^2}}} + \hat \mu - 1,} } where μ^{\hat \mu } and σ^2{{\hat \sigma }^2} are estimators of the expected value and variance of a random variable X with the B(a,b) distribution respectively.

Parameters of Γ distribution are obtained from its estimators, described by the formulas β^=μ^2σ^2,λ^=μ^σ^2,\matrix{{\hat \beta = {{{{\hat \mu }^2}} \over {{{\hat \sigma }^2}}},} \cr {\hat \lambda = {{\hat \mu } \over {{{\hat \sigma }^2}}},} } where μ^{\hat \mu } is the estimator of an expected value and σ^2{{\hat \sigma }^2} is the variance estimator of a random variable Y with the Γ(β,λ) distribution.

The estimation procedure is based on the following datasets for Poland:

georeferenced set of LAU2 spatial units in Poland (communes),

the distribution of firms in different employment size classes,

the total number of firms located in every spatial unit from the georeferenced set mentioned in 1).

The georeferenced set of spatial units is written in a commonly used shapefile format and has information about 2,479 spatial units. The data considered in 2) and 3) were obtained from Statistics Poland in Warsaw.

The maximum number of employers was restricted to 5,000. Using data from Table 1 the estimation procedure led to the following X distribution: B(0.027916752,17.30099551). The distribution is asymmetric and its PDF is presented in Fig. 1.

Classes of the employment size of firms in Poland.

Class of employmentNumber of firms
1–93,938,654
10–49146,926
50–24929,610
250–9993,706
1000 and more775
4,119,671

Source: Statistics Poland.

Fig. 1

PDF for B(0.027916752,17.30099551) distribution.

Source: own study.

Estimation of parameters for distribution is more complicated. Let’s observe that E(Y)=1dE(i=1lj=1ld(i,j)l2),{\rm{\bf E}}\left( Y \right) = {1 \over d}{\rm{\bf E}}\left( {{{\sum\nolimits_{i = 1}^l {\sum\nolimits_{j = 1}^l {d(i,j)} } } \over {{l^2}}}} \right), where (i, j) are coordinates of firms. We assume, that all firms in each spatial unit are located at the centroid of spatial unit. So, we related the total number of firms in every spatial unit to its centroid. Then, we used p = 9,999 permutations of numbers of firms in the analysed spatial units. In this way, we estimated both a mean value of distances from one firm to another, and also its variance. The mean distance was equal to 289 km.

The second problem was to gain the value of d, which is necessary for estimation of parameters β and λ. Firstly, using some procedures in R, we simulated a regular grid distribution of firms over Polish territory (n = 4,118,671). Then, we found d, which is equal to the mean distance in this grid. It should be underlined that the procedure of calculation of n(n-1)/2 distances is time consuming (the number of distances is larger than 8 1012 and needs an extremely large size of memory for holding it). So, we decided to use the permutation procedure for the estimation of d.

In the procedure we permuted p = 10,000 times rows of a two-column matrix of points coordinates on a reference-distribution grid and we calculated 10,000 mean distances between original and permuted matrices of coordinates. The mean value of those estimations was found, and was treated as the estimate of d. The dispersion between the minimum and maximum distance estimated was acceptable being no larger than 0.7 km. The obtained estimate of parameter d was 292.0782 km.

Finally, we calculated values of β and λ for Γ distribution, and they were equal to 686.0985 and 725.829 respectively. The estimated PDF distribution is presented in Fig. 2. The last part is a calculation of SPAG distribution parameters and moments using formulas (4) and (5). The result is as follows: fSPAG(SPAG)=725.829685.0985Γ(17.30099551)Γ(686.0985)B(0.027916752,17.30099551)SPAG686.0985exp(725.829SPAG)Ψ(17.30099551,687.070583;725.829SPAG)E(SPAG)=0.00152281σ^2(SPAG)=0.0000787543\matrix{{{f_{SPAG}}\left( {SPAG} \right) = {{{{725.829}^{685.0985}}\Gamma (17.30099551)} \over {\Gamma (686.0985){\rm B}(0.027916752,17.30099551)}}{SPAG^{686.0985}}} \cr { \cdot \exp ( - 725.829 \cdot SPAG) \cdot \Psi (17.30099551,687.070583;725.829 \cdot SPAG)} \cr {{\rm{\bf E}}(SPAG) = 0.00152281} \cr {{{\hat \sigma }^2}(SPAG) = 0.0000787543} }

Fig. 2

PDF for Γ(686.0985,725.829) distribution.

Source: own study.

Let us notice that in the analysed empirical case, values of the expected value and variance of SPAG are relatively small.

Geometric probability approach to SPAG

Another approach to SPAG properties has its origin in a geometric probability concept. Although the idea of geometric probability is quite old, it has not received much attention in recent studies. There exist some traditions in applications of the geometric probability to spatial analysis. One of the crucial problems related to the use of a geometric probability concept is the question of the distribution of distances between two random points within a given region. To make it simple, it is often assumed that a given region has a regular shape in one of the geometrical forms: triangle, square, parallelogram, rhombus, hexagon or finally, a form of circle. It is also worth mentioning, that this problem is from time to time rediscovered (Alagar 1976). Some bases of the geometric probability are described in the early book written by Kendall and Moran (1962). Additional notes related to the concept of geometric probability were presented by Moran (1966, 1969).

The results of the research on geometric probability, applied to our work, i.e. distance probability distributions for a regular region were published in the mid-1970s by Alagar (1976). This work was extended by two researchers: Zhuang and Pan (2011, 2017). They calculated PDF and CDF functions as well as an expected value and variance for distances between two random points in a hexagon. The theoretical results of their work were confirmed by a series of numerical simulations.

A case of a regular region with a fixed location of firms

Let us consider a case of a regular region, when the shape of a region is a hexagon. In this region, a set of n = 4,910 firms is located. The set of firms with their locations was taken from the Statistics Poland database. The borders of the region were created artificially by their mathematical definition on the map (with respect to a map projection). The location of firms is shown in Fig. 3. In this research, our question is what the distribution of SPAG is.

Fig. 3

Firms distribution in a hexagonal region.

Source: own study.

In the beginning, a possible range of SPAG in a hexagonal region was found. For the case studied the distance probability function fDHI(d), where 0 < d < 2 in a unit hexagon, was applied using formula (3) from Zhuang and Pan (2011). The expected value of d is equal to E(d)=02xfDHI(x)dx=0.826.{\rm{\bf E}}(d) = \int_0^2 {x{f_{{{\rm D}_{{{\rm H}_{\rm I}}}}}}(x)} dx = 0.826.

In our case, the side length of the hexagon (as shown in Fig. 3) is s = 159.286 km. Then, the expected value of distances between two random points in this hexagon is equal to 131.570 km. Thus, the estimate of the maximum value of SPAG is SPAGmax=2sE(d)=20.826=2.422.{SPAG_{max }} = {{2s} \over {{\rm{\bf E}}(d)}} = {2 \over {0.826}} = 2.422.

The size of firms was sampled from exponential distribution with parameter 1/2, and then rounded down to the nearest integer number. The parameter of exponential distribution with the value of ½ was selected due to the fact that the expected value of the distribution is 2.57, i.e. the mean of the real size of employment distribution in firms. For the known size of employment, circles located around firms and their union set was obtained. The empirical distribution of θco^\widehat {{\theta _{co}}} was achieved with the use of p = 9,999 permutations of the set of circle sizes. Then, the estimated parameters of B distribution for θco^\widehat {{\theta _{co}}} applying formula (6) were gained. The distribution is presented in Fig. 4.

Fig. 4

PDF for B(20078, 61748) distribution.

Source: own study.

Repeating the procedure described above, a theoretical average distance d¯grid{{\bar d}_{grid}} between firms located on a referenced hexagonal grid inside the region, and θd^\widehat {{\theta _d}} distribution were estimated. Comparing to previous analyses, the number of permutations was reduced to l = 999 (due to time wasting calculations). Finally, we obtained d¯grid=131.036km{{\bar d}_{grid}} = 131.036\;{\rm{km}} , while a real average distance between the locations of firms is d¯real=7.807km{{\bar d}_{real}} = 7.807\;{\rm{km}} . Let us observe that the value of d¯grid{{\bar d}_{grid}} is slightly smaller than E(d) for this hexagon. Associated Γ distribution and its parameters are drawn in Fig. 5.

Fig. 5

Histogram and PDF for Γ(23692,39900) distribution.

Source: own study.

Finally, the empirical distribution of SPAG was found. It is shown in Fig. 6. The moments of the distribution, were obtained from (4) and (5) formulas. The expected value is equal to 0.146, and variance is 3.55 10−6.

Fig. 6

Histogram for SPAG.

Source: own study.

Comparing theoretical results for the moments of SPAG distribution with empirical moments, it should be noticed that in this case theoretical distribution of SPAG doesn’t describe empirical distribution with sufficient accuracy. The histogram in Fig. 6 shows that the empirical mean is equal to 0.413, and is far away from theoretical point 0.146. The theoretical variance is more than 10 times larger than the empirical value which is equal to 2.2 10−5.

A case of a regular region with a simulated-random location of firms

One more approach to a regular region case (for example triangles, squares or hexagons) is the modification of the analysis conducted in the previous subsection. As above, we assume a regular hexagonal region with n = 4,910 firms randomly located within it. The size of employment in firms was randomly assigned using distribution as in the previous case. In Fig. 7, the random distribution of firms within a regular hexagonal region is presented.

Fig. 7

Firms distribution in a hexagonal region.

Source: own study.

Then, permuting p = 999 times the set of firm sizes, the empirical distribution for θco^\widehat {{\theta _{co}}} was determined. The distribution and its parameters are shown in Fig. 8. Next, the average distance between real locations of firms, and a mean distance between firms in a uniform hexagonal distribution across the region were estimated. We found d¯grid=131.036km{{\bar d}_{grid}} = 131.036\;{\rm{km}} and, d¯real=131.955km{{\bar d}_{real}} = 131.955\;{\rm{km}} .

Fig. 8

PDF for B(19797, 60830) distribution.

Source: own study.

Consequently, the empirical distribution of θd^\widehat {{\theta _d}} was delivered. As was mentioned earlier, the θd^\widehat {{\theta _d}} distribution has the Γ probability density function. The empirical distribution of θd^\widehat {{\theta _d}} is drawn (with its parameters) in Fig. 9.

Fig. 9

Histogram and PDF for Γ(30270,30049) distribution.

Source: own study.

In the last step, we calculated the distribution for SPAG. Its expected value is equal to 0.247, while variance is 3.55 10−6. The histogram in Fig. 10 presents the empirical distribution of SPAG. The moments of the empirical distribution are similar to the theoretical one. The expected value is equal to 0.243, so it is only slightly different from the theoretical value. The variance of the empirical distribution is and, it is also very close to the theoretical result.

Fig. 10

Histogram of SPAG.

Source: own study.

Conclusion

In the study of socio-economic processes, many indexes are used, but only some of them have the appropriate theoretical framework allowing to apply inferential statistical methods in the analysis. The theoretical construction of SPAG with the use of the assumed distribution of its components makes it possible to specify the distribution and moments of SPAG (an approach based on the properties of beta and gamma distribution). This allows constructing exact statistical significance tests useful for the comparison of empirically obtained values. However, the obtained moments of theoretical distribution were not very close to the empirical one. It suggests that this approach should be adjusted by additional research.

The second approach, inspired by papers on geometric distribution and papers on distance distributions in regular regions allowed finding the range of SPAG and verifying the average distance between firms in referenced regular distribution. The calculated empirical distributions of SPAG for the set of firms located in a hexagonal region, as well as theoretical and empirical moments of these distributions turn out to be promising in the characterisation of SPAG distribution.

From an empirical point of view values of the SPAG index are interpreted as follows: when all firms are located in one place within a given region, then the index value is equal to zero. The value of SPAG is equal to one when firms are located according to uniform spatial distribution, and the size of employment in all firms is the same. It is also possible, for some specific cases, that SPAG exceeds one. It is usually in the situation, when there are several clusters of firms within a given region, with large distances between them. It can be named as a border-cluster distribution of firms.

The further research of SPAG properties could consider its extension and include heterogeneity into the measurement and analysis of robustness of SPAG measurement results with respect to different methods of centroids location.

The application of SPAG by policy makers, as well as the comparability between regions, sectors over time is easy and powerful. According to Kopczewska (2017), the SPAG index can be applied for measuring the degree of spatial agglomeration (concentration, clustering). In this situation, the index gives an opportunity to follow the agglomeration process. SPAG also allows making comparisons between regions regarding agglomeration or repulsion processes, and makes it possible to know how advanced these processes in different regions are. As a consequence, policy makers can decide about supporting agglomeration processes in a region or not. And finally, policy makers are able to detect whether co-located sectors of the economy built spatial clusters.

eISSN:
2081-6383
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Geosciences, Geography