Acceso abierto

The use of web-scraped data to analyze the dynamics of footwear prices

   | 06 ago 2021

Cite

Aim/purpose – Web-scraping is a technique used to automatically extract data from websites. After the rise-up of online shopping, it allows the acquisition of information about prices of goods sold by retailers such as supermarkets or internet shops. This study examines the possibility of using web-scrapped data from one clothing store. It aims at comparing known price index formulas being implemented to the web-scraping case and verifying their sensitivity on the choice of data filter type.

Design/methodology/approach – The author uses the price data scrapped from one of the biggest online shops in Poland. The data were obtained as part of eCPI (electronic Consumer Price Index) project conducted by the National Bank of Poland. The author decided to select three types of products for this analysis – female ballerinas, male shoes, and male oxfords to compare their prices in over one-year time period. Six price indexes were used for calculation – The Jevons and Dutot indexes with their chain and GEKS (acronym from the names of creators – Gini–Éltető–Köves–Szulc) versions. Apart from the analysis conducted on a full data set, the author introduced filters to remove outliers.

Findings – Clothing and footwear are considered one of the most difficult groups of goods to measure price change indexes due to high product churn, which undermines the possibility to use the traditional Jevons and Dutot indexes. However, it is possible to use chained indexes and GEKS indexes instead. Still, these indexes are fairly sensitive to large price changes. As observed in case of both product groups, the results provided by the GEKS and chained versions of indexes were different, which could lead to conclusion that even though they are lending promising results, they could be better suited for other COICOP (Classification of Individual Consumption by Purpose) groups.

Research implications/limitations – The findings of the paper showed that usage of filters did not significantly reduce the difference between price indexes based on GEKS and chain formulas.

Originality/value/contribution – The usage of web-scrapped data is a fairly new topic in the literature. Research on the possibility of using different price indexes provides useful insights for future usage of these data by statistics offices.