Accès libre

Estimating Weights for Web-Scraped Data in Consumer Price Indices

Journal of Official Statistics's Cover Image
Journal of Official Statistics
Special Issue on Price Indices in Official Statistics
À propos de cet article

Citez

In recent years, there has been much interest among national statistical agencies in using web-scraped data in consumer price indices, potentially supplementing or replacing manually collected price quotes. Yet one challenge that has received very little attention to date is the estimation of expenditure weights in the absence of quantity information, which would enable the construction of weighted item-level price indices. In this article we propose the novel approach of predicting sales quantities from their ranks (for example, when products are sorted ‘by popularity’ on consumer websites) via appropriate statistical distributions. Using historical transactional data supplied by a UK retailer for two consumer items, we assessed the out-of-sample accuracy of the Pareto, log-normal and truncated log-normal distributions, finding that the last of these resulted in an index series that most closely approximated an expenditure-weighted benchmark. Our results demonstrate the value of supplementing web-scraped price quotes with a simple set of retailer-supplied summary statistics relating to quantities, allowing statistical agencies to realise the benefits of freely available internet data whilst placing minimal burden on retailers. However, further research would need to be undertaken before the approach could be implemented in the compilation of official price indices.

eISSN:
2001-7367
Langue:
Anglais
Périodicité:
4 fois par an
Sujets de la revue:
Mathematics, Probability and Statistics