A Product Match Adjusted R Squared Method for Defining Products with Transaction Data
Published Online: Jun 22, 2021
Page range: 411 - 432
Received: Jun 01, 2019
Accepted: Apr 01, 2020
DOI: https://doi.org/10.2478/jos-2021-0018
Keywords
© 2021 Antonio G. Chessa, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
The occurrence of relaunches of consumer goods at the barcode (GTIN) level is a well-known phenomenon in transaction data of consumer purchases. GTINs of disappearing and reintroduced items have to be linked in order to capture possible price changes.
This article presents a method that groups GTINs into strata (‘products’) by balancing two measures: an explained variance (R squared) measure for the ‘homogeneity’ of GTINs within products, while the second expresses the degree to which products can be ‘matched’ over time with respect to a comparison period. The resulting product ‘match adjusted R squared’ (MARS) combines explained variance in product prices with product match over time, so that different stratification schemes can be ranked according to the combined measure.
MARS has been applied to a broad range of product types. Individual GTINs are suitable as products for food and beverages, but not for product types with higher rates of churn, such as clothing, pharmacy products and electronics. In these cases, products are defined as combinations of characteristics, so that GTINs with the same characteristics are grouped into the same product. Future research focuses on further developments of MARS, such as attribute selection when data sets contain large numbers of variables.