Published Online: 15 Jun 2020 Page range: 243 - 253
Abstract
Abstract
Web-based browser fingerprint (or device fingerprint) is a tool used to identify and track user activity in web traffic. It is also used to identify computers that are abusing online advertising and also to prevent credit card fraud. A device fingerprint is created by extracting multiple parameter values from a browser API (e.g. operating system type or browser version). The acquired parameter values are then used to create a hash using the hash function. The disadvantage of using this method is too high susceptibility to small, normally occurring changes (e.g. when changing the browser version number or screen resolution). Minor changes in the input values generate a completely different fingerprint hash, making it impossible to find similar ones in the database. On the other hand, omitting these unstable values when creating a hash, significantly limits the ability of the fingerprint to distinguish between devices. This weak point is commonly exploited by fraudsters who knowingly evade this form of protection by deliberately changing the value of device parameters. The paper presents methods that significantly limit this type of activity. New algorithms for coding and comparing fingerprints are presented, in which the values of parameters with low stability and low entropy are especially taken into account. The fingerprint generation methods are based on popular Minhash, the LSH, and autoencoder methods. The effectiveness of coding and comparing each of the presented methods was also examined in comparison with the currently used hash generation method. Authentic data of the devices and browsers of users visiting 186 different websites were collected for the research.
Published Online: 15 Jun 2020 Page range: 255 - 270
Abstract
Abstract
Air quality data prediction in urban area is of great significance to control air pollution and protect the public health. The prediction of the air quality in the monitoring station is well studied in existing researches. However, air-quality-monitor stations are insufficient in most cities and the air quality varies from one place to another dramatically due to complex factors. A novel model is established in this paper to estimate and predict the Air Quality Index (AQI) of the areas without monitoring stations in Nanjing. The proposed model predicts AQI in a non-monitoring area both in temporal dimension and in spatial dimension respectively. The temporal dimension model is presented at first based on the enhanced k-Nearest Neighbor (KNN) algorithm to predict the AQI values among monitoring stations, the acceptability of the results achieves 92% for one-hour prediction. Meanwhile, in order to forecast the evolution of air quality in the spatial dimension, the method is utilized with the help of Back Propagation neural network (BP), which considers geographical distance. Furthermore, to improve the accuracy and adaptability of the spatial model, the similarity of topological structure is introduced. Especially, the temporal-spatial model is built and its adaptability is tested on a specific non-monitoring site, Jiulonghu Campus of Southeast University. The result demonstrates that the acceptability achieves 73.8% on average. The current paper provides strong evidence suggesting that the proposed non-parametric and data-driven approach for air quality forecasting provides promising results.
Published Online: 15 Jun 2020 Page range: 271 - 285
Abstract
Abstract
In real-world approximation problems, precise input data are economically expensive. Therefore, fuzzy methods devoted to uncertain data are in the focus of current research. Consequently, a method based on fuzzy-rough sets for fuzzification of inputs in a rule-based fuzzy system is discussed in this paper. A triangular membership function is applied to describe the nature of imprecision in data. Firstly, triangular fuzzy partitions are introduced to approximate common antecedent fuzzy rule sets. As a consequence of the proposed method, we obtain a structure of a general (non-interval) type-2 fuzzy logic system in which secondary membership functions are cropped triangular. Then, the possibility of applying so-called regular triangular norms is discussed. Finally, an experimental system constructed on precise data, which is then transformed and verified for uncertain data, is provided to demonstrate its basic properties.
Published Online: 15 Jun 2020 Page range: 287 - 298
Abstract
Abstract
The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data.
Published Online: 15 Jun 2020 Page range: 299 - 316
Abstract
Abstract
This paper presents a local modification of the Levenberg-Marquardt algorithm (LM). First, the mathematical basics of the classic LM method are shown. The classic LM algorithm is very efficient for learning small neural networks. For bigger neural networks, whose computational complexity grows significantly, it makes this method practically inefficient. In order to overcome this limitation, local modification of the LM is introduced in this paper. The main goal of this paper is to develop a more complexity efficient modification of the LM method by using a local computation. The introduced modification has been tested on the following benchmarks: the function approximation and classification problems. The obtained results have been compared to the classic LM method performance. The paper shows that the local modification of the LM method significantly improves the algorithm’s performance for bigger networks. Several possible proposals for future works are suggested.
Web-based browser fingerprint (or device fingerprint) is a tool used to identify and track user activity in web traffic. It is also used to identify computers that are abusing online advertising and also to prevent credit card fraud. A device fingerprint is created by extracting multiple parameter values from a browser API (e.g. operating system type or browser version). The acquired parameter values are then used to create a hash using the hash function. The disadvantage of using this method is too high susceptibility to small, normally occurring changes (e.g. when changing the browser version number or screen resolution). Minor changes in the input values generate a completely different fingerprint hash, making it impossible to find similar ones in the database. On the other hand, omitting these unstable values when creating a hash, significantly limits the ability of the fingerprint to distinguish between devices. This weak point is commonly exploited by fraudsters who knowingly evade this form of protection by deliberately changing the value of device parameters. The paper presents methods that significantly limit this type of activity. New algorithms for coding and comparing fingerprints are presented, in which the values of parameters with low stability and low entropy are especially taken into account. The fingerprint generation methods are based on popular Minhash, the LSH, and autoencoder methods. The effectiveness of coding and comparing each of the presented methods was also examined in comparison with the currently used hash generation method. Authentic data of the devices and browsers of users visiting 186 different websites were collected for the research.
Air quality data prediction in urban area is of great significance to control air pollution and protect the public health. The prediction of the air quality in the monitoring station is well studied in existing researches. However, air-quality-monitor stations are insufficient in most cities and the air quality varies from one place to another dramatically due to complex factors. A novel model is established in this paper to estimate and predict the Air Quality Index (AQI) of the areas without monitoring stations in Nanjing. The proposed model predicts AQI in a non-monitoring area both in temporal dimension and in spatial dimension respectively. The temporal dimension model is presented at first based on the enhanced k-Nearest Neighbor (KNN) algorithm to predict the AQI values among monitoring stations, the acceptability of the results achieves 92% for one-hour prediction. Meanwhile, in order to forecast the evolution of air quality in the spatial dimension, the method is utilized with the help of Back Propagation neural network (BP), which considers geographical distance. Furthermore, to improve the accuracy and adaptability of the spatial model, the similarity of topological structure is introduced. Especially, the temporal-spatial model is built and its adaptability is tested on a specific non-monitoring site, Jiulonghu Campus of Southeast University. The result demonstrates that the acceptability achieves 73.8% on average. The current paper provides strong evidence suggesting that the proposed non-parametric and data-driven approach for air quality forecasting provides promising results.
In real-world approximation problems, precise input data are economically expensive. Therefore, fuzzy methods devoted to uncertain data are in the focus of current research. Consequently, a method based on fuzzy-rough sets for fuzzification of inputs in a rule-based fuzzy system is discussed in this paper. A triangular membership function is applied to describe the nature of imprecision in data. Firstly, triangular fuzzy partitions are introduced to approximate common antecedent fuzzy rule sets. As a consequence of the proposed method, we obtain a structure of a general (non-interval) type-2 fuzzy logic system in which secondary membership functions are cropped triangular. Then, the possibility of applying so-called regular triangular norms is discussed. Finally, an experimental system constructed on precise data, which is then transformed and verified for uncertain data, is provided to demonstrate its basic properties.
The training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data.
This paper presents a local modification of the Levenberg-Marquardt algorithm (LM). First, the mathematical basics of the classic LM method are shown. The classic LM algorithm is very efficient for learning small neural networks. For bigger neural networks, whose computational complexity grows significantly, it makes this method practically inefficient. In order to overcome this limitation, local modification of the LM is introduced in this paper. The main goal of this paper is to develop a more complexity efficient modification of the LM method by using a local computation. The introduced modification has been tested on the following benchmarks: the function approximation and classification problems. The obtained results have been compared to the classic LM method performance. The paper shows that the local modification of the LM method significantly improves the algorithm’s performance for bigger networks. Several possible proposals for future works are suggested.