With the rapid development of online information and e-commerce, business models have undergone considerable changes to meet the rapidly changing consumer preferences. Over the past two decades, the fashion industry has been transforming, and changes in consumption patterns have forced retailers to reduce costs and increase design flexibility and speed in the market [1]. The main trends in the fashion industry’s supply chain are as follows: (1) Quick response: because customers are decision makers in the retail environment, a supply chain that can quickly respond to customer expectations is required [2]. (2) Short production: fashion trends combined with the seasonal nature of products usually result in a shorter life cycle [3]. According to the successful fashion retailer Zara, a year can be divided into 20 seasons. (3) Vertical integration: retailers can eliminate intermediaries, communicate consumer requirements directly with manufacturers, and increase production efficiency. Cooperation between retailers and manufacturers promotes product development, improves product quality, and lowers prices [4].
These trends have given rise to fast fashion, which focuses on quick, low-price, and diverse styles in small quantities, while significantly shortening the life cycle of apparel products. Consequently, retailers are constantly introducing new products. Compared to other traditional industries, the apparel industry is more consumer-oriented. Therefore, to understand consumers’ needs, apparel companies need to stay ahead of market trends. Previously, designers would absorb elements from the internet, streets, fashion magazines, or even what they saw in dramas, subjectively believing that products containing these elements would be popular [5]. Typically, we cannot determine if a product will attract consumers until it has been launched. It is fortunate, if the products are hot-selling, but for products that are not selling well, the excess supply and a waste of materials are one of the main reasons why manufacturing costs cannot be reduced.
In addition, with the growing popularity of the internet and social media sites, it is becoming increasingly convenient for people to keep up with fashion trends. The faster people purchase fashion products, the more garment companies produce them. The fashion industry is the second largest polluter worldwide [6]. Owing to the rapid growth of social media, consumers are vulnerable to changes in their preferences as a result of external factors. This increases the difficulty and uncertainty of forecasting customer demand, particularly for new products, for which historical sales data are lacking. Thus, demand forecasting plays an important role in the success of apparel companies. Inaccurate forecasting may lead to product stockouts, overstock, or even the waste of manufacturing costs. Therefore, an efficient forecasting method is necessary in the apparel industry.
In recent years, artificial intelligence (AI) models have come with strong advantages, such as the capability to derive directly from the data “arbitrarily nonlinear” approximation functions, which are highly recommended and common in demand and sales forecasting in many industries [7]. AI methods can be used to predict tendencies or preferences for apparel and sales demand. Whether for a garment company or environmental protection, this will be a win-win situation.
Therefore, the objective of this study was to examine how algorithms (machine learning and web crawlers) can assist humans (designers) in making decisions (fashion design). We proposed a two-stage method that includes suggestions and intelligent forecasting to improve efficiency in the apparel industry. In the first stage, we used a web crawler on a B2C website to identify popular products. Subsequently, the component elements of popular products are disassembled and suggested to designers for designing new products. In the second stage, we used machine learning methods on historical data to predict the sales demand for new products after they had been designed, and then incorporated external information indexes from Google Trends to reduce demand forecasting errors. This study not only helps the company know whether this product will be a hot seller before it goes public but also proves that external factors have a certain influence on predicting future trends.
The main contributions of this study are as follow:
Using a web crawler to support designers in fashion design and creating new products. An intelligent forecasting method is proposed to predict the sales demand for new products. The objective of this method is to add external information indices to decrease the forecasting error gaps. Even without historical sales records, we can predict future demand using historical product sales that are highly similar to those of new products.
In traditional forecasting methods, sales forecasting is calculated using statistical methods such as exponential smoothing, linear regression, the moving average, weighted average, Bayesian approach, and ARIMA. Reference [8] examined the applicability of the Bayesian forecasting model to fashion products and used simulated fashion product demand to demonstrate that the model generated better results than other forecasting methods. The authors in [9] considered the retail of different types of women’s shoes as an example, showing that ARIMA is more accurate than the other three reliable forecasting approaches, and is generally suitable for forecasting time series. Although statistical forecasting appears to be simple and rapid, some defects may occur. Cause statistics emphasize causality. Their operation involves inferring future predictions from historical data and can effectively predict products with stable demand. However, traditional statistical methods may not be able to obtain accurate results when faced with highly irregular issues.
AI methods have emerged owing to the increasing power of computers. In this era of information explosion, with the advantages of data and computing power, AI uses different aspects of information to solve prediction problems. Fashion sales are usually affected by selling prices, color, climate, and social media. Thus, the AI technique can be applied to forecast sales demand in the fashion industry. In the past, several studies used AI techniques for fashion sales forecasting. Reference [10] proposed an artificial neural network (ANN) model that allows nonlinear approximation functions to learn the above exogenous factors directly from data to predict women’s apparel sales. The authors of [11] employed an evolutionary neural network (ENN) model to forecast fashion sales using evolutionary computation to generate an ANN. They found that ENN is suitable for products with low demand uncertainty and weak seasonal trends, but not for seasonal products. Sun et al. [12] discussed the relationship between fashion sales and some important factors affecting demand using an extreme learning machine (ELM) model. Their results showed that the ELM model was superior to several sales forecasting methods. However, the ELM is not suitable for complex tasks; it performs well for simpler tasks.
Recently, the authors in [13] studied the performance of bagging tree regression, random forest (RF) regression, and gradient boosting regression algorithms by predicting the sales demand of multiple products from different stores in the next three months; the results showed that the performance of fuzzy time series regression was the best. Reference [14] proposed a model for intuitionistic FTS forecasting based on the average length of the interval, which enhances the forecasting result.
The authors in [15] conducted a hybrid ANN model and fuzzy function based on the concept of comparable products and forecasted new apparel product demand considering stockouts. Reference [16] built a model that included multiple variables, and both new and historical products were represented by determinate variables, including product characteristics, internal organization, and expert opinions. In a different way, we use both internal (historical sales data) and external (Google Trends data) data to predict sales demand for new products.
Reference [17] studied large-scale fashion sales data and inferred that apparel attributes and sales factors affect demand. They used the given new-product attributes to predict demand using different machine learning methods. Recently, [18] applied AI methods such as decision trees, random forests, XGBoost, and an artificial neural network to predict textile product sales. They used a dataset to forecast sales in stores, including sales time, sales volume, and location. In a different manner, we used sales data and product features to predict the sales volume of new products. The models proposed use fewer data and are suitable for short-term or trendy products in the fashion industry. To the best of our knowledge, this is the first study that combines production design suggestions and sales forecasts. First, we help designers to design their favorite products and then provide an intelligent forecast method to predict new product sales demand in the apparel industry.
This section includes the following two phases. Phase 1 describes the use of a web crawler to help designers design new products, as presented in Subsection 3.1. Section 3.2 introduces the two demand forecasting models (baseline model and intelligent forecasting approach) in Phase 2. Figure 1 illustrates the research framework.
A web crawler is an Internet bot that can automatically browse content on the internet, such as pictures, text and videos, and can be used to obtain a large amount of public information. In Phase 1, to achieve an assisted design, we must obtain the component elements of the product; thus, to obtain sales and product names, we use a web crawler on a certain B2C online store. We collect sales information from websites to identify popular products and elements and then provide suggestions for a new product design to designers. The product’s elements include its characteristics, which could be the type of clothes, sleeves, materials used, colors, or any other thing that makes the product different or similar to others. This information can be provided through text descriptions or image recognition. Almost all B2C websites provide customer references. Figure 2 shows an example of the different elements divided based on the sleeve styles. The result of phase 1 was a new product design. Thus, Phase 1 was used as the suggestion system for the designers.
An example of this is presented below. Figure 3 shows the product page of the B2C online store, where the red squares represent the content collected from the web crawler. We started collecting data from June 1, 2020, to July 31, 2020, then split product names, discarded subjective adjectives, used objective nouns as elements, and disassembled them into
Owing to the lack of historical sales data for new products, sales forecasts can be made based on similar products (i.e., internal data). We selected existing products that duplicated more than three elements in the elemental composition of the new product as the internal data. For example, after Stage 1, the designer designs a new product (Item A). Item A has elements No.1-4-6-8 (q=1, q=4, q=6, and q=8). If previous products have elements (1,4,6), (1,4,6,10), and (1,6,8,13,20) and their sales volumes are 500, 1000, and 2000, respectively, the sales volume (500, 1000, 2000) will be used as the internal data input for clustering and classification before forecasting the sales demand of Item A.
The baseline model is as follows: First, we used clustering algorithms (
Because there were no sales data for new products in the past, the reason clustering was used to cluster product sales by element combinations is that it duplicates more than three elements of historical data with new products. This is also used as a pre-processing step for classification. Clustering is used to determine the relationship between the composition of elements and sales, whether it is the more duplicate elements or the key elements that cause the difference in distance from the cluster center, that is, the sales volume. In this study, the elbow method was used to determine the K value, and the Euclidean distance was used for the calculation.
Both
In this study, two machine learning methods (RF and XGBoost) were utilized to solve classification and regression problems. The two machine learning methods are illustrated as follows:
The RF algorithm steps for both classification and regression are as follows [19]:
Select For each of the bootstrap samples, grow an unpruned classification or regression tree. At each node, the best split is randomly selected from all feature subsets instead of the complete feature set. All predictions of
Therefore, splitting the learning process of the classifier is a random process. This means that the elements were randomly selected, and the entropy values were calculated. Next, the smallest entropy value is chosen as the root node and then randomly re-selected and split. In the case of a random forest, the split feature points are randomly selected, multiple parallel decision trees are established, and the final classification result is determined by a majority vote. In this study, the final result is represented by probability. For example, there are three groups (1,2,3), and the random forest creates 100 decision trees. Each decision tree has a final classification result of 20 percent in Class 1, 30 percent in Class 2, and 50 percent in Class 3. The final result is Class 3, with the highest probability (50 percent).
The authors in [20] proposed the
After training, we obtain the label of a new product based on the probability of being assigned to each label. As per the rule, the highest probability determines the label to which it belongs. Therefore, we obtained the probability of each internal datum for each label and used these probabilities in the next step.
In this section, to predict new-product sales demand, we used RF and XGBoost regressions. In the previous steps, we identified the labels of both the internal data and the new product and the probability of each label. In this section, the input features are the new product elements, the probability of each label, the cluster center, and the constant. We now introduce them individually.
The new product element is The probability of each label is The cluster center is The constant is
In addition, we chose some common hyperparameters for the random forest and XGBoost, such as the learning rate or controlling the number of iterations to reduce the learning rate, or the penalty coefficient to avoid overfitting. The output data represent the real sales demand for each internal data point. After training, we obtain the prediction results for the new product.
Recently, many researchers have demonstrated the value of social media information in operational management. To reduce out-of-sample forecasting errors, the authors in [21] showed how to use social media information from Facebook. Reference [22] supported the results of [21] by showing that Google Trends can improve out-of-sample forecasting errors. The above studies discussed the B2C industry, that is, customer activities directly affect demand, and these two authors used social media sites to capture customer behavior as external information to improve the accuracy of sales forecasting. This study considers the sales demand for new products in the B2C apparel industry. We used Google Trends search indices for specific search terms, which are elements of new products and are used as external information indices. For example, if a new product A has eight elements, we checked Google Trends to determine the number of times those elements were searched. A longer search time for this element indicates higher potential demand for the product.
Figure 4 shows the application of external data to the baseline model to obtain an improved demand-forecasting framework. We set a crucial threshold to measure external indices and used multiple regression analysis to transform the external data into demand-sense adjustments.
In the baseline model, we assume that the new-product sales demand is
Based on the results of the baseline model, we selected internal data with the same label as the new product and divided them into two groups for regression analysis to adjust demand forecasting. We set the difference between the sales and demand forecasting of the baseline model as
Finally, we design an intelligent forecasting approach, as shown in Equation (6). We combined the results of the baseline model with demand sense adjustments, including the addition and subtraction of external information index data.
According to the forecasting steps mentioned earlier, in this section, we use actual data to compare the improvement in accuracy of different forecasting models. Section 4.1 discusses the elements filtered from the results of the data crawler and the composition of the new product design. Section 4.2 demonstrates the results of the intelligent forecasting approach with the implemented demand-sense adjustment. Furthermore, we compared the accuracy of sales predicted by the forecasting methods mentioned in Section 3 with the actual sales of new products in August.
In this subsection, we used a web crawler on an online shopping store that specializes in women’s apparel and collected the sales data of all 112 products from June 1, 2020, to July 31, 2020. We then filtered out the tops of all the products. We split the product names, discarded subjective adjectives, used objective nouns as elements, and disassembled them into 83 elements. These elements make the products different from or similar to other products. The split elements and sales volume are provided by the apparel store, which is split using natural language processing. Subsequently, designers attempted to create new products based on the popular elements. Consequently, eight items were created that could be considered new products.
The results of the element combinations for new products are as follows:
Item A (contains eight elements): tee, short sleeve, long version, spliced, plain, split, T-shirt, and frill. Item B (contains six elements): tee, short sleeve, neck, plain, light, and shear. Item C (contains eight elements): tee, plain, loose, vest, swing, crew neck, skater, and sleeveless. Item D (contains six elements): tee, short sleeve, neck, plain, T-shirt, and pocket. Item E (contains seven elements): tee, neck, plain, vest, full, wrap-over, and sleeveless. Item F (contains seven elements): tee, short sleeve, splice, cotton T-shirt, crew neck, cap sleeve, and drape.
Based on the new products we designed in Section 4.1, we generated forecasted sales demand in the following month (August 2020) through the baseline model and then applied an intelligent forecasting approach for adjustment. In this study, the performance of these approaches was evaluated based on the forecasting errors of new product sales in August 2020. Three evaluation metrics, the mean squared error (MSE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE), were used to measure the performance of the different methods.
In
Based on the results of the baseline model for each new product, we implemented an intelligent forecasting approach, the results of which are presented in Table 1. Table 2 presents the evaluation metrics used to measure the performance of each item using the intelligent forecasting approach.
Demand forecasting results after demand sense adjustment
Item | Real Sales | Baseline Model | Intelligent Forecasting Approach |
---|---|---|---|
57 | 49.51 | 59.15 | |
2 | 12.61 | 0.60 | |
10 | 20.22 | 12.11 | |
13 | 25.99 | 7.57 | |
86 | 144.70 | 112.58 | |
65 | 51.45 | 55.02 |
Evaluation metrics of intelligent forecasting approach
Intelligent Forecasting Approach | |||
---|---|---|---|
MSE | RMSE | MAPE | |
4.64 | 2.15 | 3.78% | |
1.97 | 1.40 | 70.17% | |
4.46 | 2.11 | 21.11% | |
29.45 | 5.43 | 41.75% | |
706.60 | 26.58 | 30.91% | |
99.51 | 9.98 | 15.35% |
The improvements in the intelligent forecasting approach relative to the baseline model are summarized in Figure 5. From the comparison of the performance of the intelligent forecasting approach with that of the baseline model, the intelligent forecasting approach was observed to be more accurate than the baseline model. The relative MSE improvement ranged from 45.79% to 98.25%, the relative RMSE from 26.35% to 86.80%, and the relative MAPE improvement from 26.34% to 94.72%, which were calculated as follows:
This study proposes a two-stage method, including suggestions and intelligent forecasting, to improve the sales forecasting of new products in the apparel industry. First, we used a web crawler to decompose the results into elements to determine which elements were popular and preferred. These results can be used as a reference by designers when designing new products. Second, we show how to predict sales demand for new products from similar items and improve the forecasting model when combined with external information indexes. The core value of this study is the application of algorithms (machine learning and web crawlers) to assist humans (designers) in making decisions (fashion design). We combine market trends with demand forecasting and external information indices to reduce forecasting errors. The results showed that compared with other machine learning models (RF and XGBoost), the intelligent forecasting approach can effectively reduce the MSE, RMSE, and MAPE by at least 45.79 %, 26.35 %, and 26.34 %, respectively. The intelligent forecasting approach proposed can be an effective method for the B2C industry to forecast new product demand. Our proposed model is useful and suitable for the fashion industry, where the selling period is shorter than a few months and designers should design new products as soon as possible.
The demand forecasting method used in this study can be studied further. We provide the following suggestions: Based on this forecasting method, we can forecast the sales demand of products throughout their life cycle. This study uses a machine learning-based method with implemented demand-sense adjustments. In future studies, we plan to conduct experiments with additional machine learning algorithms to enhance the performance of the baseline model. External information indices can also be obtained from other aspects, such as observing people’s recent preferences on other social media sites. In addition to investigating the search volume for new product elements in the apparel industry, it can also be extended to product styles or colors. This study uses nondigital product elements or features to make predictions. In the future, we may consider verifying whether this method can be applied to other products that are also represented by product features. Furthermore, it is not limited to the fashion industry, but can be tested in other B2C industries or supply chain network design problems [23–25].