À propos de cet article

Citez

Introduction

On-line shopping is becoming an important channel of purchases besides the traditional off-line retail. According to the Eurostat data, the share of e-commerce in total turnover has already reached 15% in EU15, and in countries like Great Britain, it comprises more than 20%. The European Commission (EC), aware of the fundamental meaning of digital economy, made the completion of the Digital Single Market (DSM) one of its priorities. The aim of the DSM is to ensure the cross-border access of on-line activities under conditions of fair competition in the entire EU (European Commision 2015).

According to standard theories such as the Law of One Price, there should be no significant difference in prices on integrated markets. A potential impact of the stronger integration of digital markets between the EU member states is the decrease in the price dispersion across countries in on-line prices. However, this is not what is observed in the EU in recent years. According to the Eurostat (2016), prices converged in the EU28 between 2004 and 2008, but afterwards this process stopped, or even reversed. On the other hand, the level of price dispersion is fairly stable in the EU15. Recently, the European Commission (EC) has undertaken several legal initiatives, such as geo-blocking directive or vertical restraints inquiry to foster competition on digital single market. EC hopes that with increased volume of cross-border trade, further reduction of price dispersion will be achieved, even though there is still lack of understanding of all the factors and implications behind it. Importantly, lowering the costs of arbitrage between markets subject to third degree price discrimination could reduce price dispersion in many different ways, bringing adverse distributional effects in high and low income countries. As economic literature suggests, a-priori there is no conviction that the net welfare effect of such intervention will always be positive, unless prices in lower income country do not increase (Malueg and Schwartz 1994).

The objective of our study is to assess the scale and determinants of current on-line price dispersion within the EU. With respect to measurement, we use the web-scrapped price data. Web-scraping is an innovative technique used to analyse the price dynamics, for example, by Cavallo (2015). The novelty of our approach lies in using data from price comparison services instead of working with great number of individual sites which often use incompatible web technologies. In this way, we are able to collect tons of individual price signals from virtually all shops selling a particular item in each country. With such a large and diversified data, we can measure price dispersion in a robust way and make frequent updates at relatively low cost.

With respect to the pattern of observed on-line price dispersion among member states, we analyse three broad groups of factors that influence its magnitude: (a) supply side factors; (b) demand side factors and (c) institutional factors. Understanding sources of the price dispersion is crucial for correct targeting of the policy interventions.

Our paper is organized as follows: section 2 provides literature review, section 3 describes data collection technique and summarizes the data, section 4 sheds light on the price dispersion pattern, while section 5 summarizes and concludes.

Literature review

An important aspect of economic convergence is the decrease of price dispersion (the difference between prices for identical products). Faber and Stokman (2009) examine long run price convergence within the EU between 1960 and 2003. In order to gain such long time series, the authors scale harmonised indices of consumer prices (HICP) with occasional observations of absolute price levels. They conclude that during the observed time period, the price differences halved, and that the price convergence had been driven by indirect tax rate harmonisation, convergence in input costs due to exchange rate stability and openness of economy. Rogers (2007) also confirms price convergence between 1990 and 2004. Using data from the Economist Intelligence Unit (EIU), he claims that the price dispersion in tradable goods reached the level observed in the US. Bergin and Glick (2007) analyse the global price dispersion between using the EIU data and report a U-shaped pattern of price convergence between 1990 and 2005. A possible driver of the rising dispersion from 1995 is the price of oil and transport costs.

The introduction of the euro should have decreased price dispersion due to removing transaction costs, exchange rate risk and higher price transparency (Wolszczak-Derlacz 2010). However, the literature has been divided in the evaluation of the impact of the euro on price dispersion. Allington, Kattuman, and Waldmann (2004) finds that price convergence accelerated due to the euro, while others, including Parsley and Wei (2008), Wolszczak-Derlacz (2010) or Fischer (2012) did not find such causal effect. Ogrokhina (2015) examines the impact of the common market and the EMU on price convergence of 120 traded goods. While the Single European Act decreased price differences by 5%, the single currency increased it by 2%. However, the euro had different impact considering product categories- price convergence had been observed for cars, while divergence for food and tobacco. Baye et al. (2006) study the impact of euro on prices, using data collected from a price comparison site (Kelkoo). The data covers prices for 28 products across 7 EU countries. The authors find an increase of prices relative to the non-EU countries.

The observed cross-country differences in prices may be caused by international price discrimination of the producers (differences in mark-ups), or by differences in costs (Verboven 1996). Mark-ups may be influenced by differences in consumer income and taste, while costs by transportation, service wages and so on. Additionally, structural causes, like market structure or differences in taxes (VAT) may cause prices to differ (DG Internal Market 2002).

Price dispersion has been widely analysed in the car market, due to the huge price differences between EU15 countries. Verboven (1996) shows that international price discrimination, measured by cross-country differences in relative wholesale mark-ups, is among the drivers of price-differences. Interestingly, car producers seem to charge a higher mark-up for their domestic market, than at the foreign markets. An analysis of price convergence in the car market is also provided by Goldberg and Verboven (2005), who find evidence on convergence to the Law of One Price between 1970−2000.

Alessandria and Kaboski (2007) show that income per capita explains half of the aggregate price dispersion across countries. Using a wide sample of the US export data, the authors suggest that mark-ups account for 40% of this relationship and that costumers in low income countries are more price sensitive. Crucini and Yilmazkuday (2014) analyse the determinants of price dispersion using a panel of retail prices from 79 countries in the period 1990−2005. The authors find that service-sector wage differences almost entirely explain price dispersion at the PPP level. When analysing price dispersion at the level of individual goods and services, wage differences explain around one third of dispersion, with an equal impact of trade related costs. However, the authors are aware of the fact that the prices of tradable goods are affected by non-tradable elements; therefore, it is difficult to distinguish between costs and mark-ups.

Simonovska (2015) is tackling this issue analysing the relationship between the price of on-line tradable goods and per capita income. The author is using a dataset of prices on apparel products produced by a Spanish manufacturer, which are sold in 29 various countries (23 EU member states). The examined products are only available on-line, furthermore they are manufactured and shipped from a single place; therefore, destination-specific price contributions (such as wages) can be suppressed. The author finds that per-capita income differences account for one third, while shipping costs explain up to a third of the cross-country differences in prices. More specifically, doubling the per capita income of the buyer’s country leads to an 18% rise in the price level of apparel and footwear. Additionally, the author finds that Eurozone membership decreases price dispersion relative to Spain.

On-line shopping is becoming an important channel of purchases besides the traditional off-line retail. According to Eurostat data, the share of e-commerce in total turnover (without the financial sector) has already reached 15% in EU15, and in countries like Great Britain comprises more than 20%. The European Commission, aware of the fundamental meaning of digital economy, made the completion of the Digital Single Market (DSM) one of its priorities. The aim of the DSM is to ensure the cross-border access of on-line activities under conditions of fair competition in the entire EU (European Commision, 2015). A potential impact of the stronger integration of digital markets between EU member states is the decrease in the price dispersion across countries in on-line prices.

To sum up, the literature suggests that price differences are indeed influenced by both mark-ups and costs. International price discrimination can be classified as third degree price discrimination, in which the seller discriminates among different groups of consumers based on observable heterogeneities (for example, the geographical location of the consumer or income). Theoretical literature shows that third degree price discrimination might increase or decrease overall social welfare compared to uniform pricing. Price discrimination is not harmful to consumers from low income countries and welfare effects of reducing price discrimination will have opposite signs in case of low and high priced markets. Even if the net effect is positive for the whole EU, some compensation mechanism should be in place.

For the analysis of monopoly market, see Cowan (2008), Aguirre, Cowan, and Vickers (2010), Galera, Kinateder, and Mendi (2014), while for duopoly markets refer to Corts (1998), Stole (2003) and Adachi and Matsushima (2014). Malueg and Schwartz (1994) show the necessary conditions for price discrimination to increase the welfare of an economic union, which refer to the convexity of respective demand functions and the overall size of weak and strong markets.

This article addresses an important research gap in the literature. While price dispersion has been examined in the off-line retail across EU countries, on-line retail has been only partially analysed (e.g., Baye et al. 2006). This work provides the first analysis on the drivers of price dispersion across on-line retail in all the EU member states, using data on homogeneous products.

Scope of price dispersion
Web-scraping technique and advantages and disadvantages of Scrapped-data

Cavallo (2015) introduces Scraped Data as a new source of micro-price information to measure price stickiness. It is only one of the examples of usage of the web-scraping techniques, which are getting attention as the new approach to analyse behaviour of prices. Lünnemann and Wintr (2011), Gorodnichenko and Talavera (2014) and Gorodnichenko, Sheremirov, and Talavera (2014) find that on-line prices tend to be more flexible and have smaller price changes than the off-line prices. However current research was mainly focused on the price dynamics and their patterns. However, to the authors’ best knowledge, there is still a gap in the empirical literature using the web-scraping and investigating the price dispersion between countries. To fill this gap we investigated the price differences in on-line retail in the European Union.

There are a few advantages of using on-line rather than off-line prices. Firstly, collecting data remotely is a much cheaper option than the survey based methods, especially in case of cross-country setting. Secondly, it is possible to include large number of retailers. This huge advantage is important, as the result won’t be affected by the price strategies of a particular company or retail chain. However, in case of increasing the number of retailers, it would be necessary to design the web-scraping tool for every retailer separately, if they are using different web-page designs. To address this issue, we run our web-scraping tool on the data provided by the price comparisons services.

The list of the price comparisons services is available upon request.

It allows us to limit web-scraping tool complexity significantly with keeping the diversity of on-line retailers. Thirdly, we are able to collect many observations for one product, which should improve the quality of price information in our database. It is connected with one disadvantage – as the automated tool is collecting all information provided by the price comparison service, we observe ‘heterogeneity’ bias

As ‘heterogeneity’ bias, we refer to the situation when the price comparison service asked about the price of the particular product provides us with price information of the most similar products in case of not having proper data. Even though we use restrictive cleaning techniques to fully address this concern, we use the medians of the prices in further analysis.

extending tails of price distribution. To address this, we use the medians of prices for every country-product level in our analysis.

Our approach is similar to the one presented in Cavallo (2015). As we are not intending to analyse the dynamic of prices we are collecting our data in one point of time. Precisely, our procedure works in three steps. Firstly, we collect information about most popular products searched in price comparisons services and we choose only those, which have the most ‘unified’ names across the countries (i.e., the name of the product is the same in at least 9 countries). Secondly, we design a dedicated tool to visit the price comparison service web-pages and to collect information about the product, price and price information provider (i.e., online shop that is included in the price comparison service database). Our program is designed to read the webpage code, recognize proper tags defined separately for every price information provider

For more details about markup languages and their tag structure, please see Coombs, Renear, and DeRose (1987).

and store the data. Finally, we run our program in two different modes – using its own IP and using the external server placed in particular country. The last step is designed to ensure that we are getting the same price information as the local consumer on a particular market.

Data set and results

We collect the data about five of the most popular categories in on-line retail, in particular: fashion (clothing and footwear), cosmetics and healthcare (cosmetics and perfumes), computer games and software, electronics and computer hardware, and household appliances. The full list of products contains 657 different product names divided into categories, which results in 143,490 observations. The full split of the sample by products and by observations is presented in Tab. 1.

The number of products and observations in each product category

# of products
Categoryscrappedafter cleaningin final sample
Fashion36215433
Cosmetics & Healthcare873324
Households Appliances585635
Electronics Computer Hardware and847543
Computer Games and Software666447
Total657382182
# of observations
Categoryscrappedafter cleaningin final sample
Fashion58,4107,7134,272
Cosmetics & Healthcare25,4725,4194,736
Households Appliances11,6433,9553,181
Electronics Computer Hardware and23,6977,5036,666
Computer Games and Software24,2687,5266,547
Total143,49032,11625,402

Source: Authors’ own elaboration.

To ensure that we are collecting valid price information, we apply a few rigorist filters on our data, for example, every word from the product name (further called ‘product query’) is included in the description of our price record. These procedures limit the number of products to 382 and the number of observations to 32,116. Finally, we keep only the products, which are recorded in at least 9 countries in our sample. The final impact on the number of products after each step and in every category is presented in details in Tab. 1. As it can be seen, the cleaning procedure impacted the data significantly − finally we keep only around 19.1% of the sample. In the case of working with scrapped data, we face a trade-off between the number of data providers (price comparison services, on-line retailers’ webpages) and code complexity. In particular, if we want to extend the number of data providers, we need to define proper tag structure of every new one. Using data from price comparison services, we can address this problem; however, we need to pay more attention to its quality.

Even though at the beginning of data collection, we were trying to ensure that the data are equally distributed between countries; after all the cleaning procedures, we observe that for some of them, the total number of observations remains low. Total number of products and observations for each country is presented in Tab. 2. It has to be noted that only for two countries, the numbers remains low (Belgium and Portugal), while for others we get satisfying sample sizes.

The number of products and observations for EU countries

Country# of products# of observationsCountry# of products# of observations
Austria1431,183Italy131910
Belgium1637Latvia1152,447
Bulgaria101678Lithuania92494
Croatia91340Netherlands39380
Czech Republic1442,427Poland1371,805
Denmark127844Portugal36158
Estonia58763Romania1411,152
Finland73346Slovakia1521,963
France92766Slovenia91469
Germany70989Spain72386
Greece1301,359Sweden1442,177
Hungary1491,482United Kingdom1311,342
Ireland76505Average102.01,016.1

Source: Authors’ own elaboration.

We observe significant price dispersion in the European Union, with respect to both gross and pre-tax retail prices (see Tab. 3 and Tab. 4). We calculate average price dispersion in three steps. Firstly, as mentioned before, we calculate the median price on country-product level (for every product and for every country separately). It cleans our data from outliers and ‘heterogeneity’ bias issue. Secondly, we compute the mean of the median prices for every product. It gives us the average price of the product in European Union. Finally, we calculate the difference between median price and the product average price and scale it by the product average price. The results are presented in first column of Tab. 3. The absolute price deviation in the EU on an average is (+/-) 11.2%. In similar way, we calculate different dispersion

The gross (retail) price dispersion in European e-commerce industry

CategorySubcategoryAverage absolute price dispersionAverage # of observation per productQuantile coefficient of dispersion (0.9-0.1)Quantile price dispersion (0.9-0.1) [EUR]Quantile coefficient of dispersion (0.75-0.25)Quantile price dispersion (0.75-0.25) [EUR]Coefficient of variation (mean)Coefficient of variation (median)
FashionClothing13.9%9.9139.3%27.6518.3%12.030.350.34
FashionFootwear12.3%12.3631.1%27.9118.0%16.080.250.26
FashionAccessories9.9%7.9226.4%21.1612.1%10.520.540.49
Cosmetics & HealthcareCosmetics12.9%12.7236.2%11.9319.4%4.281.161.53
Cosmetics & HealthcarePerfumes12.5%12.5335.3%22.9120.5%13.20.380.39
Households AppliancesHouse9.6%6.6520.9%12.058.7%5.040.350.34
Households AppliancesHouse (>100EUR)10.4%7.0221.1%58.8511.3%30.420.781.31
Households AppliancesKitchen11.4%7.0635.5%64.3218.4%35.050.780.91
Households AppliancesBeauty13.1%5.5131.8%17.2417.8%10.10.510.5
Electronics and Computer HardwareAccessories10.7%10.0531.6%14.9819.5%9.780.510.5
Electronics and Computer HardwareLaptops and PCs11.4%8.6728.8%162.7416.3%94.610.550.84
Electronics and Computer HardwareMonitors7.8%9.1820.6%39.5411.5%19.810.951.46
Electronics and Computer HardwareTablets and E-readers9.6%7.3624.1%73.5714.0%43.340.550.59
Electronics and Computer HardwareConsoles9.6%10.0924.0%67.8911.7%32.180.40.4
Computer Games and SoftwareSoftware11.7%9.1132.4%87.8918.6%55.350.70.81
Computer Games and SoftwarePC games12.3%7.4635.5%13.2921.4%8.090.390.36
Computer Games and SoftwareConsole games10.7%7.4431.7%14.7418.2%8.50.260.25
Total11,2%8.9329.8%16.2%0.550,66

Source: Authors' own elaboration.

statistics, every time starting from median price on country-product level as proxy of the price in respective country for particular product. Price dispersion in the whole product sample, measured by interquartile range of median (post-tax) relative price distribution, amounts to 16% for 75−25 percentile interval. While this is a rather conservative measure, the figure almost doubles (30%) for 90−10 percentile range. Analysing the variation of the price differences, we see that both approaches – one based on quantiles and one based on the moments, show us that the possibilities for price arbitrage exist. In most extreme situations, they can reach even more than 160 EUR as in case Laptops and PCs category or more than 87 EUR for software, measured by quantile price dispersion (the difference between 90% quantile and 10% quantile). The scope of relative pre-tax dispersion is very similar, while in absolute terms it is obviously lower (see Tab. 4).

The pre-tax price dispersion in European e-commerce industry (VAT excluded)

CategorySubcategoryAverage absolute price dispersionAverage # of observation per productQuantité coefficient of dispersion (0.9-0.1)Quantité price dispersion (0.9-0.1) [EUR]Quantité coefficient of dispersion (0.75-0.25)Quantile price dispersion (0.75-0.25) [EUR]Coefficient of variation (mean)Coefficient of variation (median)
FashionClothing14.50%9.8339.53%21.6620.38%10.550.360,34
FashionFootwear12.28%12.3431.35%22.0617.76%12.360.260,27
FashionAccessories9.85%7.9224.60%15.6213.45%8.900.520,47
Cosmetics & HealthcareCosmetics12.51%12.8634.11%9.2618.51%3.701.171,49
Cosmetics & HealthcarePerfumes12.46%12.5735.35%17.4118.93%8.990.370,38
Households AppliancesHouse10.96%6.3825.05%14.6612.19%7.900.420,39
Households AppliancesHouse (>100EUR)8.83%7.4222.49%52.5813.12%29.110.741,22
Households AppliancesKitchen11.58%6.9635.26%52.1319.79%30.310.750,83
Households AppliancesBeauty13.18%5.6133.12%14.1418.07%7.910.500,49
Electronics and Computer HardwareAccessories10.38%9.9829.74%11.4718.55%7.330.500,49
Electronics and Computer HardwareLaptops and PCs11.50%8.6728.34%129.8016.13%77.040.550,84
Electronics and Computer HardwareMonitors7.69%9.1820.01%29.6410.79%16.840.961,47
Electronics and Computer HardwareTablets and E-readers9.93%7.3328.01%67.7113.59%32.390.550,60
Electronics and Computer HardwareConsoles10.08%10.0725.05%55.1512.73%28.220.410,40
Computer Games and SoftwareSoftware11.74%9.1432.08%67.4017.00%40.060.680,78
Computer Games and SoftwarePC games12.63%7.5236.46%10.6321.04%6.050.380,35
Computer Games and SoftwareConsole games11.04%7.4633.45%12.0419.06%6.760.270,26
Total11,2%8.9030.24%16.53%0.550.65

Source: Authors' own elaboration.

As results presented above show significant price dispersion in on-line retail, we confront our data with the Eurostat price levels data (precisely we use prc_ppp_ind variable from Eurostat database for year 2014). We expect a significant difference between our and Eurostat levels due to the differences in the products included in the sample (see Fig. 1). However, in terms of magnitude, we see that the range of scale dispersion is similar in both datasets. This shows that the analysed phenomenon is not exclusive for on-line retail.

Fig. 1

Price dispersion in European e-commerce industry compared with Eurostat data

Source: Authors’ own elaboration.

Our data confirms significant price dispersion in European e-commerce industry. Dispersion of prices was previously confirmed in the economic literature as is shown in section 2. Also, Eurostat provides comprehensive data confirming the phenomenon. As we can argue that in case of the off-line transaction making the cross-border transactions is connected with high transaction costs; for on-line retail, they are mostly limited. Therefore, to answer the question about the main factors behind price dispersion, usage of the on-line price data seems to be more appropriate.

Price dispersion pattern

Economic literature discussed in section two suggests at least three broad groups of factors that influence the magnitude of price dispersion: (a) supply side factors like the cost of manufacturing, sales and distribution of goods, (b) demand side factors such as preferences and more importantly purchasing power of consumers and (c) institutional factors such as differences in the level of competition in the market or taxation. It should be noted that although in most instances, items are produced in developing countries, and imported to the EU, factor prices in particular member states will still affect economic conditions of pursuing activity of on-line shops as sale, distribution and marketing generate bulk of costs, which are born domestically. Moreover, there is a large scope of differences among EU member states with respect to the contractual law regarding, for example, seller liability or scope of consumer protection that will be most likely reflected in prices.

In the following sections, we verify to what extent supply and demand side factors influence systematically on-line price dispersion among the EU states. We start this analysis by looking at various factor prices available in Eurostat, such as unit labour cost, unit cost of energy and fuels or capital costs. We expect that these supply side variables could affect prices in an obvious way. On top of it, two demand side variables are considered: annual net salary in the private service sector and GDP per capita. We also include standard VAT rates that generally apply to the product categories which we analyse.

Although VAT rules are not harmonized across the EU, reduced rates apply usually to books and baby clothing and equipment – all of which have not been considered in our study.

The scale of differences in all these variables among the Member States of the European Union is given in Tab. 5.

Descriptive statistics of price dispersion determinants in the EU countries

Variablemeanstandard deviationmedianminmaxrangeskewkurtosis
Real [EUR, GDP ths] per capita25.5614.8519.114.9249.3644.440.2-1.64
Unit [EUR/cost h] of labour18.6712.0115.63.840.336.50.45-1.45
Average annual net salary [EUR/full time]8.995.786.831.9518.9717.020.43-1.51
Electric energy price [EUR/kwh]0.180.060.170.090.30.210.64-0.46
Europrice 95 [EUR/petrol litre]1.310.141.341.11.570.470.2-1.21
Diesel price [fuel EUR/litre]1.240.11.231.131.540.411.180.87
Cost interest of capital rate [%] – (a)1.81.591.140.527.817.292.225.55
Standard VAT rate [%] (b)22.042.0721192780.6-0.69
Population density130.34111.73103.4117.98500.57482.591.82.96

Source: Eurostat, average annual data (2014 ) except for (a) ECB (status as at 31/10/2015), (b) DG TAXUD (status as at 1/09/2015).

Results of the estimation

At present, in the EU, we can still observe considerable differentiation with regards to the factors presented in Tab. 4. To assess in a statistically meaningful way which of those factors have the strongest influence on dispersion of on-line prices in the EU countries, we have conducted linear regression analysis. As a dependent variable, we took relative price deviation, which is closely related to interquartile price dispersion but takes both negative and positive values around the average from median prices calculated for the countries where a given product was supplied. More specifically we have estimated the following model:

PDi,j=f(SSi,j,DSi,j,PCk)$$P{{D}_{i,j}}=f\left( S{{S}_{i,j}},D{{S}_{i,j}},P{{C}_{k}} \right)$$

where PDi,j is percent deviation of median price of product i in country j from the average of median prices for all countries where the product was offered on-line. SS and SD are vectors of supply and demand side factors and PC are product subcategories. Detailed list of variables is provided in Tab. 6 below.

List of variables used in regression analysis

Name of variable in the modelVariable description
PD: [percent_deviation]Percentage deviation of the median of product prices in a given country from the average of medians for all countries based on the web scraping study of results from price comparison websites
DS.: [GDP_pc]Real GDP per capita in EUR thousands, at the end of 2014
DS: [net_earnings]Average annual net salary in 2014
SS: [pop_density]Population density – proxy for transportation costs
IE: [vat_stdrate]Current level of standard VAT rate
SS: [ele_cost]Unit cost of electric energy, average annual value in 2014
SS: [E95_price]Euro95 petrol price, average annual value in 2014
SS: [diesel_price]Diesel oil price, average annual value in 2014
SS: [unit_lab_cost]Unit cost of labour, average annual value in 2014
SS: [interest_rate]Current interest rate on 10-year Treasury bonds in the secondary market– capital price index
PC: [Games for PC], [Games for consoles], [Consoles], [Cosmetics], [Laptops and computers], [Small household appliances for the home
< EUR 100], [Small household appliances for the home > EUR 100], [Small household appliances for the kitchen], [Monitors], [Footwear],
[Perfumes], [Software], [Tablets and e-book readers], [Clothing], [Body care appliances] – Binary variables for subcategories of products.

Source: Authors’ own elaboration.

Prior to estimation, we inspected the relevance of our three categorical variables (Category, Subcategory and Country) by the analysis of the differences among group means of the dependent variable. We implement various techniques: visual assessment of boxplots (Fig. A1A3 in Appendix, respectively), analysis of variance model (Tab. A1A3 in the Appendix) and using Tukey’s range test (Fig. A4A6 in the Appendix). All methods suggest that there are no significant differences between product categories and subcategories, while the values differ across countries, for example, in the case of Poland and Portugal, the average deviation of median prices is noticeably negative (Fig. A3 in the Appendix). We address this issue by including GDP per capita in the regression equation. The correlation between our dependent variable and GDP per capita in the whole sample is 18%. To guarantee that this variable captures differences across countries in the deviation in prices, we check correlation between differences in the dependent variable and in the GDP pc. It is significantly positive (54%), ensuring that including GDP pc into the regression addresses the cross-country differences in the dependent variable. We also check Levene’s test (Tab. A4A6 in the appendix), which shows that we need to address the problem of heteroscedasticity.

Fig. A1

Boxplots - Percentage deviations of median prices – Product Category Variables

Fig. A2

Boxplots - Percentage deviations of median prices − Product Subcategory Variables

Fig. A3

Boxplots – Percentage deviations of median prices − Country Variables

Fig. A4

Tukey’s range test − Product Category Variables

Fig. A6

Tukey’s range test − Country Variables

Fig. A5

Tukey’s range test − Product Subcategory Variables

Result of Levene’s Test for Homogeneity of Variance − Country Variables

DfF valuePr(> F)
group242.230.0005
2526

Starting with the full model specification (Model 1), we face the problem of collinearity of independent variables. Based on the variance inflation factor test (VIF test), we drop the unit labour cost from the model (Model 2, see Tab. A7 in the Appendix).

To address the problem of collinearity we could conduct the factor analysis; however, we would lose the interpretability of the model.

Results of the VIF Test

variableModel 1Model 2Model 3Model 4
diesel_price4.844.78
E95_price5.585.562.252.25
ele_cost4.023.24
GdpR_pc_rep7.707.292.232.23
interest_rate2.431.971.501.50
netearnings38.905.78
pop_density1.551.551.351.35
ulab_cost40.29
vat_stdrate1.561.44

Next, we carry out the model selection procedure using two approaches: frequentist model averaging and stepwise regression. Model selection uncertainty can lead to the underreporting of variability and too optimistic confidence sets. Frequentist model averaging accounts for this uncertainty and incorporates it into the estimation process (Wang, Zhang, and Zou 2009). The results are reported in Tab. A8. We include only variables with relative importance > 0.5 to the restricted form of the model (Model 3).

Frequentist model averaging − relative variable importance

Importance:N containing models
GdpR_pc_rep1128
pop_density1128
E95_price0.92128
interest_rate0.55128
ele_cost0.45128
diesel_price0.34128
vat_stdrate0.29128
netearnings0.28128

For robust model selection, the stepwise regression procedure is implemented. Starting with the full model and using the AIC procedure, the same variables are chosen as in the previous approach (reported as Model 4). The standard diagnostic tests

We use Breusch-Pagan test and Goldfeld-Quandt test against heteroscedasticity (Tab. A9−A10 in the Appendix), and the Durbin-Watson test for autocorrelation (Tab. A11).

show no heteroscedasticity, however, we notice auto-correlated errors. Therefore, the final model is estimated with a robust enhanced White variance-covariance estimator. The coefficients of the final model are presented in Tab. 7.

Regression results shed light into the pattern of observed price dispersion among the EU countries. Its magnitude is conditional on the level of GDP per capita (‘GDP_pc’ variable), population density (‘pop_density’) and petrol price (‘E95_price’). The relationship between price dispersion and GDP per capita follow the expected direction: an increase in GDP per capita by EUR thousand is connected with an increase in deviation of the price in the given country from the median of prices in the EU by 0.14 percentage point (the estimated coefficient: 0.0013864). The influence of the GDP variable can be illustrated on the example of Poland and United Kingdom, whose GDP levels are, respectively, EUR 11 thousand and 41 thousand. Based on the model results, prices in Poland will be, due to the differences in income, on average 4.2 percentage point lower than in the United Kingdom. The other main driver of price dispersion is petrol price: a 1 EUR increase of average petrol price leads to 10.5 percentage points higher deviation from EU average prices (coefficient: 0.1049988399).

Results of linear regression 'percent_devation' variable for the

Model 1Model 2Model 3 /4Final Model
EstimatesConfidence Intervalsp-valuesEstimatesConfidence Intervalsp-valuesEstimatesConfidence Intervalsp-valuesEstimatesConfidence Intervalsp-values
(Intercept)-0.0928-0.20-0.010.077-0.1419-0.24 - -0.050.003-0.1397-0.20 - -0.08<.001-0.1517-0.21 --0.09<.001
pop_density-0.0002-0.00 - -0.00<.001-0.0002-0.00 - -0.00<.001-0.0002-0.00 - -0.00<.001-0.0002-0.00--0.00<.001
netearnings-0.0064-0.01 --0.000.0390.0003-0.00 - 0.000.829
ulab_cost0.00350.00-0.010.02
vat_stdrate-0.0013-0.00 - 0.000.385-0.0003-0.00 - 0.000.831
ele_cost-0.2056-0.39 - -0.020.033-0.1069-0.28 - 0.060.217
E95_price0.0733-0.02-0.160.1080.08-0.01 -0.170.0790.0870.03-0.140.0030.1050.05-0.16<.001
diesel_price0.0093-0.10-0.110.8620.0227-0.08-0.130.668
GdpR_pc_rep0.00210.00-0.00<.0010.00180.00 - 0.00<.00100.00 - 0.00<.0010.00140.00-0.00<.001
interest_rate0.00570.00-0.010.0190.0032-0.00-0.010.1390.0027-0.00-0.010.156
Observations2551255125512551
R2/ adj. R2.047/.043.045/.042.043 / .042

Source: Authors' own elaboration.

Summary and conclusions

This study applies a new way of collecting price data to measure on-line price dispersion across the EU for a broad variety of homogeneous commodities. We have obtained a unique snapshot picture of price dispersion based on the data collected from price comparison sites using the web-scraping technique. We have analysed the pattern of observed price dispersion with selected economic characteristics of member states to gain better insights into factors, which determine on-line prices. Our study brings two main results.

First, price dispersion in the whole product sample, measured by interquartile range of median (pre-tax and post-tax) relative price distribution, amounts to 16% for 75−25 percentile interval. While this is a rather conservative measure, the figure almost doubles (30%) for 90−10 percentile range. We note that both figures show average price dispersion for the whole product sample. The sample itself is composed of a large variety of branded items representing all main product categories. We believe that this large heterogeneity adds to the robustness of our results. The magnitudes of price dispersion differ quite a lot between product categories and subcategories, ranging from roughly 20% for electrical household appliances to 40% for clothing.

Secondly, price levels grow with GDP per capita and petrol price. According to the results of regression analysis, an increase in the real GDP per capita by one thousand EUR increases the deviation of median price by 0.14 percentage point relative to the average level for the EU. In turn, an increase of average petrol price by 1 EUR increases the price dispersion by 10.5 percentage points. We note that the observed link between per capita income and price dispersion might be consistently explained with at least two thorough mechanisms. The first is classical third degree price discrimination based on differences in demand elasticity induced by income levels. The second is supply side mechanism related to cost differences. Unfortunately, due to lacking data on quantities, we could not isolate both effects.

Lacking sales data makes it impossible to account for market competition, which is also an important determinant for price dispersion. There is some evidence that market power of merchants from off-line channel is often transmitted to e-commerce.

Nevertheless, we argue that it is reasonable to assume that both mechanisms are likely in force. Consequently, at least part of observed price dispersion is induced by strategic price discrimination between country markets within the EU. This has important bearing for new policy interventions aiming to reduce price dispersion in the Digital Single Market.

While price dispersion between EU countries is unavoidable because of objective differences in taxes, wages and capital costs, as well as lack of harmonization of contract laws, the main policy question remains whether its scope is not too large and how can it be effectively reduced? Some policy interventions are now being debated within the EC, in response to public consultations on geographically based-restrictions initiated in September 2015. The new regulation will reinforce equal treatment of cross-border consumers with so called ‘shop like a local’ principle − and perhaps will go even further and impose an obligation to serve customers across the EU giving the trader the right to charge extra delivery costs. What would be the effects of such regulation?

Lowering barriers for cross border trade will lower costs of arbitrage for consumers, and result in their ability to exercise price discrimination by distributors on the European level. It is, however, unlikely that equal treatment principle would have any effect on non-strategic component of price dispersion, because on-line retailers act in competitive environments and hence, all existing differences in operational costs between countries will be passed on retail prices.

EC interventions to lowering barriers for cross-border trade such as reducing geo-blocking restrictions or liberalizing parcel delivery market would most likely decrease price dispersion due to reduction of incentives for strategic price discrimination on the wholesale level. This would eventually decrease prices in high income countries bringing positive welfare effects for customers. On the other hand, prices in lower income countries are expected to increase, because in this way, traders can reduce potential benefits from arbitrage to customers. Hence, welfare effect for customers from lower income countries will most likely be negative. Producers’ surpluses are expected to change in the opposite directions.

The major conclusion from this discussion is that if price dispersion is a result of strategic (income induced) price discrimination on producer and/or distributor level, then any policy intervention reducing trade barriers would essentially bring adverse distributional effects on customer and producer welfare in particular member states. As economic literature suggests, a-priori there is no conviction that the net effect is guaranteed to be positive (Malueg and Schwartz 1994). Therefore, we argue that any policy interventions targeting price dispersion in the Digital Single Market should be carried out very carefully.