Revenue forecast models using hybrid intelligent methods

Strategic planning, organizing, directing, and control of all financial activities within a company or institution constitute financial management. Along with being crucial to fiscal management, it also entails applying management principles to an organization’s financial assets. Financial planning and budgeting ensure that the company has enough liquidity. A company that does financial planning has created a financial map. It assesses the causes of deviations from the target dates. Based on income and expense statements, financial planning identifies financial weaknesses and strengths as well as needs. Companies can expand their range of products or services and pursue new opportunities for methods that will increase their productivity and profitability with financial planning that allows them to make decisions that keep the company’s value at the highest level. As a result, it can compare its position in the industry to that of other companies. It is necessary for good financial planning to understand the future values of some financial components and to manage the related processes accordingly. Making accurate forecasts for the relevant components is critical. In the last few years, numerous methods have been used for revenue forecasting. In [1], Chen et al. predicted the direction of one-year earning changes using machine learning methods and high-dimensional financial data. The models that outperformed used Logistic Regression (LR), small sets of accounting variables, and professional analyst estimates. In [2], Chung et al. investigated how different machine learning models performed in revenue estimation for local methods and compared the performance of various machine learning algorithms in revenue estimation. The findings revealed that traditional statistical methods outperformed machine learning algorithms in predicting the property tax revenue of K-Nearest Neighbors (KNN). Via [3], Kureljusic et al. investigated and analyzed prediction models generated by machine learning algorithms using publicly available data. When compared to financial analysts, machine learning algorithms provided more accurate revenue forecasts. In [4], Lin et al. presented Generalized Additive Models (GAMs) and machine learning models based on Artificial Neural Networks (ANNs) that were developed to predict the optimal revenues of an integrated power generation and storage system. Based on optimized solutions from the Conventional Hydroelectric Power and Environmental Resource System (CHEERS) model, predictive equations and models were developed. Model validation prediction errors of GAMs and machine learning models were less than 5%; regression equations in machine learning models performed better. In [5], Mousa et al. proposed using earnings per share as a performance metric to predict corporate financial performance, employing three supervised machine learning methods: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Random Forest (RF). They also used a sample of 63 publicly traded banks from eight emerging markets between 2008 and 2017. The study concluded that the best prediction model was created with the RF, and evidence about the accuracy and performance of the presented models was discovered. Models had less than 5% error; regression equations in machine learning models performed better. In [6], authors explained the detail of machine learning applications for financial forecasting. Machine learning appears to be ideally adapted to enhance forecasting, analysis and planning by substantially automating the information extraction process from massive datasets. The simulation carried out in this paper demonstrated the potential of machine learning for forecasting and planning. Additionally, as the number of data points rose, the development of forecasting and planning was investigated. In [7], authors contrasted the Capital Asset Pricing Model’s (CAPM) performance with machine learning algorithms and methodologies for predicting the price of financial assets, as well as implementations of machine learning algorithms on High-Performance Computing infrastructures. On out-of-sample test data, machine learning models beat CAPM after being trained on time series data. In [8], authors used the machine learning and deep learning approaches to provide a reliable forecast and total corporate profits in the US economy. The primary tool used to present this prediction method was the Rapid Miner software. Drawing on these predictions and based on economic theory, this article explored the implications of assumptions made to date regarding the relations between the working class and the elite. In [9], authors used the deep learning methods in the financial industry. The maturity of technology was also evaluated in different areas of the financial sector. Deep learning was not yet the most used technology in financial industry, but research showed that some problems in this area required deep learning features. In [10], a hybrid approach was proposed by combining Simple Linear Regression (SLR) and RF to increase the estimation accuracy. In addition, it was recommended to use effective floor space for training SLR and RF models. The hybrid approach of the proposed estimation methodology proved effective in reducing the risk of Building Information Modeling labor cost estimation. In [11], authors showed that large scale financial time series experiments could produce more accurate forecasts than those made by professional financial analysts. In [12], the cash flows of accounts receivable have been estimated using methods applicable to companies with a large number of customers and transactions. Before moving to neural networks with MultiLayer Perceptron, forecasting techniques such as Autoregressive Integrated Moving Average and Prophet and Long-Short Term Memory (LSTM) networks that have not been used for cash flows until now were discussed. In [13], authors showed the hierarchy of importance of financial factors of institutions and tried to make prediction with deep learning/machine learning models. A comparison was made between the recommended Extreme Gradient Boosting and deep LSTM models to aid the research. In [14], authors examined revenue projections made by financial professionals and identified factors that affect forecast accuracy. The benefit of revenue estimations was projected using a model that was thus created. Analysts who perform worse in predicting sales are more likely to give up than analysts who perform better. The study helped academic researchers and investors in their understanding of the factors that influence income estimates. In [15], the feasibility of Chinese privately traded companies as a distressing example was analyzed using statistical methods and a prediction model based on Support Vector Machines (SVM) was developed. The grid search technique, which used 10-fold cross-validation, was used to find the best parameter value of the core function of the SVM. The SVM model outperformed traditional statistical methods and back propagation neural networks.

The rest of this paper is organized as follows. In section 2, dataset generation is given briefly. In section 3, methodology is presented by considering isolation forest, minimum redundancy maximum melevance, density based spatial clustering and application with noise and random forest. In section 4, results and discussion is reported in detail. In section 5, the conclusion is introduced by giving the results of this paper.

Dataset Generation

The dataset includes 1826 rows of data on a daily basis between January 1st, 2017 and December 31st, 2021 and includes the revenue of a seller along with some other features. The attributes in the dataset and their explanations are given in Table 1.

Table 1

Attributes found in the dataset.

Attribute Name	Definition
Year	Year
Month	Mont
Day	Day of the month
Weekday	Weekday
Weekend	Weekend
USD_close	Closing value of the USD
USD_open	Opening value of the USD
USD_max	Maximum value of the USD
USD_min	Minimum value of the USD
EUR_close	Closing value of the EUR
EUR_open	Opening value of the EUR
EUR_max	Maximum value of the EUR
EUR_min	Minimum value of the EUR
Bist100_close	Closing value of the Bist100
Bist100_open	Opening value of the Bist100
Bist100_max	Maximum value of the Bist100
Bist100_min	Minimum value of the Bist100
Bist100_capacity	Capacity value of the Bist100
Total_covid_cases	Total number of people who caught in Covid
New_covid_cases	Number of new Covid cases
Total_deaths	Total number of people who died from Covid
New_deaths	Number of new deaths from Covid
New_covid_tests	Number of new tests
Total_covid_tests	Total number of tests
New_covid_vaccinations	Number of new vaccines
Total_covid_vaccinations	Total number of vaccines

Methodology

3.1

Isolation forest

IF is used to find outliers and abnormalities. By calculating how far a data point is off the average, it isolates outliers rather than modeling the typical points. The alternative approach of IF that explicitly isolates outliers using binary trees shows a new potential of a faster anomaly detector that directly targets abnormalities without profiling all the typical cases. The linear time complexity of the method, the small constant requirements, and minimal memory usage make it successful when dealing with massive volumes of data [16]. The statistics of the revenue are given in Table 2 and the same obtained after removing the extreme values are given in Table 3.

Table 2

Statistics of revenue.

Statistics Name	Values
Number_of_lines	1826
Minimum	0
Maximum	4758634.92
Mean	1353609.99

Table 3

Statistics of revenue after applying IF method.

Statistics Name	Values
Number_of_lines	665
Minimum	0
Maximum	4654363.55
Mean	1242224.18

3.2

Minimum redundancy maximum melevance

The feature selection has a big impact on how well estimate algorithms perform. The feature selection method consists of identifying and selecting the most advantageous aspects of the dataset. This technique has a substantial impact on the machine learning model performance. Unnecessary features can increase the model’s error rate when the test dataset’s input data considerably differs from the training dataset, lengthen the model’s training time, and induce over fitting, which makes the model successful in the training dataset but fail in the test dataset. For these reasons, the mRMR algorithm [17] was used for feature selection in order to improve the performance of the models to be developed for revenue forecasting. The features selected using the mRMR are; Weekend, Year, Weekday, EURO_open, Total_covid_cases, USD_open, EURO_open, New_covid_tests, EURO_max, USD_min, EURO_close, Total_covid_deaths, USD_max, USD_close, Total_covid_tests.

3.3

Density-based spatial clustering and application with noise

Using the unsupervised learning technique known as clustering analysis, the data points are separated into numerous distinct bunches or groups, with the aim of ensuring that the characteristics of the data points within the same group are similar and those of the data points within different groups are somewhat different. In this study, the DBSCAN [18] clustering method was applied. The Density-Based Clustering concept is an unsupervised learning method that identifies distinctive groups/clusters in the data, based on the idea that a cluster in data space is a contiguous region of high point density, separated from other such clusters by contiguous regions of low point density. Within the scope of the study, the data set was divided into 5 clusters using the DBSCAN method. Component values of the data sets obtained after clustering are given in Table 4.

Table 4

Statistics of revenue after using DBSCAN.

Cluster	Number of lines	Minimum	Maximum	Mean
Cluster 1	399	1051826.12	1234772.39	1431150.39
Cluster 2	41	734682.64	762463.19	789182.86
Cluster 3	190	862860.11	956598.34	1044740.19
Cluster 4	115	94800.86	147999.73	202836.34
Cluster 5	168	1433005.72	1497854.28	1568416.04

3.4

Random forest

The RF is a supervised learning technique for regression. RF generates numerous decision trees throughout the training phase and averages the classes in order to anticipate all trees. From the training set, RF randomly selects k data points. A decision tree containing k data points is consequently produced. After selecting the desired number of trees, each tree built predicts the y-value for each data point [19]. The values sought and found by grid search for RF hyperparameters are given in Table 5.

Table 5

Hyperparameter values of RF.

Hyperparameter Range	Model Hyperparameter Values
“min_samples_leaf”:[3,4,5,6]	min_samples_leaf: 5
“min_samples_split”:[3,4,5,6]	min_samples_split: 5
“n_estimators”:[50, 100, 200]	n_estimators:100

Results and discussion

Three different approaches were applied in order to accurately forecast the revenue. In the first approach, after applying only simple preprocessing steps to the dataset, forecast models were developed with RF. In the second approach, IF was used to detect outliers on the dataset, and the mRMR feature selection algorithm was utilized to correctly select the features that affect the quality of revenue forecast. In the last approach, the feature selection process was performed first and then the DBSCAN was used to cluster the dataset. After these processes were carried out, forecast models were developed with RF. The dataset used includes the daily revenue of a seller among several other features and covers the time period from January 1st, 2017 to December 31st, 2021. Grid search was used to obtain the best values of the hyperparameters of the RF method. The performance of the developed models was evaluated using MAPE. MAPE’s of the models developed with three different approaches are shown in Table 6.

Table 6

Hyperparameter values of RF.

Approach	MAPE (%)
First approach	24.20
Second Approach	16.95
Third Approach/Cluster 1	7.69
Third Approach/Cluster 2	1.90
Third Approach/Cluster 3	4.66
Third Approach/Cluster 4	17.42
Third Approach/Cluster 5	2.36

The average of the MAPE values obtained for the five clusters in the third approach is 6.80.

− When the prediction models developed with the first approach and the second approach are compared, it has been determined that the MAPE obtained with the second approach is 7.25 % lower. When this result is evaluated, it has been determined that developing the prediction model by removing the outliers from the data set and applying the feature selection algorithm gives more successful results.

−When the prediction models developed with the first approach and the third approach are compared, it has been determined that the MAPE obtained with the third approach is 17.40 % lower.

− When the prediction models developed with the third approach and the second approach are compared, it has been determined that the result obtained with the third approach is 10.15 % lower.

− As a result of the comparisons, it has been determined that when the forecast model is developed for each cluster by dividing the data set into clusters, more successful results can be obtained in the revenue forecast.

− Results show that the IF outlier detection algorithm improves the performance of the models.

Conclusion

Corporate financial planning teams can leverage future forecasts created with machine learning-based algorithms by drawing on historical data to identify potential opportunities that will impact the future and growth of the business plan according to predicted consumer behavior, and stay ahead of the competition. In this way, they can quickly respond to changes and have the ability to make flexible but sound financial decisions through simulations of different scenarios. Three different approaches were thus applied to forecast revenue in this study. In the first approach forecast models were developed with RF. In the second approach, IF algorithm and mRMR algorithm were used. In the last approach, the feature selection process was performed first and then the DBSCAN was used to cluster the dataset. After these processes were carried out, forecast models with RF were developed. The results show that the lowest MAPE value was obtained with the third approach. It is finally observed that the average MAPE of the third approach is 17.40% lower than that of the first approach, and 10.15% lower than that of the second approach.

Declarations

6.1

Conflict of interest

The authors hereby declare that there is no conflict of interests regarding the publication of this paper.

6.2

Funding

Not applicable.

6.3

Author’s contribution

G.T.-Conceptualization, Data Curation, Methodology, Formal Analysis, Writing-Original Draft, Writing Review Editing. T.A.K.-Software, Data Curation, Validation. K.P.-Formal Analysis, Data Curation. M.F.A.-Supervisior. All authors read and approved the final submitted version of this manuscript.

6.4

Acknowledgement

Not applicable.

6.5

Data availability statement

All data that support the findings of this study are included within the article.

6.6

Using of AI tools

The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

eISSN:: 2956-7068
Język:: Angielski

Częstotliwość wydawania:: 2 razy w roku
Dziedziny czasopisma:: Computer Sciences, other, Engineering, Introductions and Overviews, Mathematics, General Mathematics, Physics

Kanał RSS czasopisma

Revenue forecast models using hybrid intelligent methods

Article Category: Original Study

Data publikacji: 31 paź 2023

Zakres stron: 117 - 124

Otrzymano: 14 cze 2023

Przyjęty: 22 lip 2023

DOI: https://doi.org/10.2478/ijmce-2024-0009

Słowa kluczoweRevenue forecasting, machine learning, hybrid methods, marketplace

© 2024 Gizem Topaloğlu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
Revenue forecasting, machine learning, hybrid methods, marketplace