Vehicle Driving Cycle (Driving Cycle), also known as vehicle test cycle, which describes the speed-time curve of the vehicle. It is an important general-purpose basic technology in the automotive industry, which is used to limit pollutant emissions and energy consumption, and to apply evaluation technology [1-2] for new vehicle development. In the early days, developed countries such as the United States, Japan, Europe, etc., all established vehicle driving conditions based on their cities. For example, Lin et al. [3] analyzed and compared the driving conditions constructed in California through a stochastic process; Michel et al. [4]. studied that different vehicles should be represented by different driving conditions. In recent years, domestic scholars have conducted in-depth research on the driving conditions of automobiles and have achieved some research results. For example: Gao Jianping [5], Peng Yuhui [6] and others use the K-means algorithm respectively to construct the driving conditions of passenger cars in Zhengzhou and Xiamen, but the K-means algorithm is easy to fall into a local optimal solution, which affects the stability and accuracy of the construction conditions; Hefei university of technology, represented by professor Shi Qin [7] team, led by typical road in hefei, for example, was carried out on the road data acquisition, processing, and using the particle swarm optimization method of fuzzy clustering [8], although can avoid local optimization to some extent, but for large data, there will be a problem of poor local search ability; Cao Qian et al. [9] constructed the driving conditions of Changchun city based on Markov chain, but when the data amount was small and the error of calculating the state matrix was large, the accuracy of the final construction conditions would be resulted.
Up to now, domestic scholars generally adopt the k-means algorithm to construct their own cities. The common problem is that the k-means algorithm is sensitive to outliers, the parameter K (cluster number) is difficult to determine, and it is easy to fall into the local optimal solution. Although some other algorithms have appeared, in the case of large amounts of data, the algorithm is inefficient in execution and cannot efficiently classify large quantities of data. Therefore, this paper proposes the Grid-K-means algorithm to achieve the clustering classification effect, and the clustering effect is stable, accurate and efficient, and the constructed working condition can comprehensively reflect the local traffic situation.
The data collected in this paper includes vehicle running time, speed, latitude and longitude and other information, sampling frequency of 1Hz, and data is collected on different roads. The original data is denoised [10-11] by discrete Fourier transform, the steps are as follows:
set f( In the above formula: adjust the value of the high frequency signal to 0 or a smaller value, while the value of the low frequency signal remains unchanged to achieve signal smoothing and noise reduction, the frequency domain is restored to the time domain by the inverse Fourier transform, the expression is as follows:
From Figure 1, we can see that the discrete Fourier transform has a good filtering effect on the data, which can effectively eliminate the influence of noise on the data, thus avoiding the distortion in the calculated acceleration.
Kinematics segment refers to the vehicle speed range from the beginning of the idle state to the beginning of the next idle state [12]. As shown in Figure 2 below:
According to the domestic and foreign research sports state classification standards, the acceleration threshold can be selected as 0.1m/s2 to define four driving states:
Idling state: the engine continues to work, and its speed is zero during this process; Deceleration state: the speed is not zero, and the acceleration during this continuous operation is less than or equal to -0.1m/s2; Acceleration state: the speed is not zero, and the acceleration is greater than or equal to 0.1m/s2 during this continuous operation; Constant speed state: the speed is not zero, and the absolute value of the acceleration during this continuous operation is less than or equal to 0.1m/s.
In order to fully and systematically reflect the driving characteristics of each feature and each operational segment, according to relevant literature[5-9], this paper selects 12 feature parameters to reflect each short-stroke feature:(segment duration) T/
Using Matlab to preprocess the 185726 sets of data to obtain 164089 sets of data, use R language programming to obtain 1585 kinematics segments, and then calculate the characteristic parameters of each kinematics segment respectively to obtain the number of samples (rows) x and characteristic parameters (columns), as shown in Table 1 below.
K
Fragment number | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 119 | 203.29 | 6.15 | 7.60 | 0.06 | 0.17 | 0.08 | 0.03 | 0.42 | -0.66 | 0.48 | 5.05 |
2 | 319 | 2320.7 | 26.19 | 33.63 | 0.14 | 0.26 | 0.19 | 0.11 | 0.30 | -0.34 | 0.37 | 17.5 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
78 | 243 | 346 | 7.03 | 12.23 | 0.04 | 0.21 | 0.08 | 0.04 | 0.38 | -0.61 | 0.39 | 5.01 |
79 | 167 | 459.12 | 5.97 | 8.04 | 0.06 | 0.19 | 0.05 | 0.05 | 0.44 | -0.59 | 0.28 | 4.91 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
1584 | 169 | 829.04 | 17.66 | 23.65 | 0.13 | 0.19 | 0.09 | 0.07 | 0.36 | -0.50 | 0.48 | 14.8 |
1585 | 486 | 6799.9 | 50.37 | 52.65 | 0.02 | 0.18 | 0.13 | 0.30 | 0.19 | -0.20 | 0.29 | 16.5 |
There are 1585 kinematics segments, each with 12 parameters, forming a 1585*12 matrix. The classification of 12-dimensional data requires a lot of work. Too much data will make the subsequent calculation steps more complicated, and some parameters are not independent of each other. Therefore, it is necessary to use principal component analysis to reduce the dimension of the data, which has reached a small number of principal components. It can reflect many original feature parameters.
This article uses SPSS for analysis based on the preprocessed data of the data set. The contribution rate and cumulative contribution rate of each principal component are shown in Table 2 below.
P
Serial number | eigenvalue | contribution/% | Cumulative contribution/% |
---|---|---|---|
1 | 5.5605 | 43.23 | 43.23 |
2 | 3.2315 | 26.83 | 70.06 |
3 | 1.9901 | 16.14 | 86.20 |
4 | 1.0434 | 6.14 | 92.34 |
5 | 0.53476 | 3.11 | 95.45 |
6 | 0.45796 | 2.01 | 97.46 |
7 | 0.38341 | 1.22 | 98.68 |
8 | 0.21525 | 0.78 | 99.46 |
9 | 0.17450 | 0.23 | 99.69 |
10 | 0.10126 | 0.19 | 99.88 |
11 | 0.09324 | 0.08 | 99.96 |
12 | 0.02532 | 0.04 | 100 |
It can be seen from Table 2 that the cumulative contribution rate of the first three principal components has reached 86.2%, and the eigenvalues of these three principal components are all greater than 1. According to statistical theory [13], in principal component analysis, in general, it is sufficient to select principal components with a cumulative contribution rate of 85%, which basically represents all the information of the 12 characteristic parameters of the fragment, so the first three principal components can be selected for cluster analysis.
Through SPSS software analysis, the load matrix of each principal component can be obtained, so the correlation coefficients of the first 3 principal components and 12 characteristic parameters can be determined. As shown in Table 3 below.
P
Characteristic parameters | |||
---|---|---|---|
Segment duration T | 0.132 | 0.231 | 0.745 |
distance S | 0.293 | 0.134 | 0.045 |
Average speed V |
0.719 | 0.463 | -0.025 |
Average driving speed V |
0.478 | 0.615 | 0.112 |
Idle time ratio T |
0.125 | -0.351 | 0.843 |
Acceleration time ratio T |
0.694 | -0.156 | 0.060 |
Deceleration time ratio T |
0.923 | 0.341 | -0.123 |
Cruising time ratio T |
0.641 | 0.435 | -0.045 |
Mean acceleration a |
0.014 | 0.623 | 0.033 |
Mean deceleration a |
0.366 | -0.433 | -0.052 |
Standard deviation of acceleration a |
0.445 | 0.267 | -0.067 |
Standard deviation of speed Vstd | 0.387 | 0.215 | 0.034 |
The principal component loading represents the correlation coefficient between the principal component and the original variable. The larger the correlation coefficient (absolute value), the more representative the principal component of the variable. According to the principal component loading matrix in the above table, some parameters corresponding to a certain principal component can be obtained. As shown in Table 4 below.
T
Category | Eigenvalue | |
---|---|---|
Primary component | principal M1 | average speed, acceleration time ratio, deceleration time ratio, cruising time ratio |
Secondary component | principal M2 | average speed, acceleration time ratio, deceleration time ratio, cruising time ratio |
Third component | principal M3 | segment duration, idle time ratio |
According to principal component analysis, the first, second and third principal components were selected for further analysis.
For traditional clustering algorithm clustering, the initial clustering center has a greater impact on the clustering results, which is easy to reduce the accuracy of the algorithm, unable to handle noisy data, and the clustering results are unstable and relatively inefficient when processing large-scale data sets [14-15]. This paper proposes a Grid-K-means clustering algorithm based on Grid improvement, which introduces the idea of grid division into K-means algorithm, reduces the number of manually set initial parameters, speeds up the ability to process data, and reduces the time complexity. This improves the overall efficiency of the algorithm. The main steps of the algorithm are as follows:
Let the data set be S, and each sample point x Read all the sample points in the data set, and record the maximum and minimum values of each dimension E Calculate the data range of each dimension, R set the grid number of each dimension (GridNum), Determine the grid step size of each dimension in the data set: Convert the data set into grid points, set the dynamic threshold, if the grid density is less than the threshold, it is noise, and if the grid density is equal to zero, it is invalid grid points T Sort the grid points, select K high-density grid points as the initial cluster centers (C Iteratively and circularly calculating the grid clustering center until the grid clustering center no longer changes ; Reversely calculating the corresponding cluster center of sample points
In order to verify the performance of the improved algorithm Grid-K-means proposed in this article, simulation experiments select 3 sets of data (20,000, 80,000, and 100,000) for verification, and the accuracy of clustering results and the performance of different algorithms are compared respectively, as shown in the following table 5 below.
C
Data set | k-means | k-means++ | Grid-K-means | |||
---|---|---|---|---|---|---|
Accuracy | Elapsed time | Accuracy | Elapsed time | Accuracy | Elapsed time | |
20000 | 92.76% | 25.82s | 93.16% | 34.86s | 95.78% | 25.83s |
80000 | 88.32% | 43.83s | 89.32% | 49.16s | 92.36% | 40.69s |
100000 | 87.24% | 58.12s | 87.65% | 67.94s | 90.12% | 50.38s |
It can be seen that the performance improvement of grid-K-means algorithm is not significant when dealing with small data sets; but as the amount of data increases, the accuracy of the clustering results and the clustering speed of Grid-K-means are both Far beyond k-means, and k-means++ algorithm.
The kinematics fragments are divided into 3 categories by using the improved K-means algorithm. Table 6. shows the average eigenvalues of each category after clustering. As can be seen from Table 6, the first category is traffic congestion, because the average speed and average driving speed are low and the idle time ratio is high; The second type is smooth road conditions, because the average speed and the average driving speed are higher, the idle time ratio is in the middle, and the cruising time ratio is in the middle; The third category is high-speed road conditions, because the average speed and average driving speed are the highest, idle time ratio is the lowest.
A
Clustering categories | ||||||
---|---|---|---|---|---|---|
Class 1 | 6.23 | 0.12 | 11.06 | 0.16 | 26.15 | 0.59 |
Class 2 | 16.25 | 0.25 | 18.26 | 0.59 | 35.56 | 0.23 |
Class 3 | 23.03 | 0.49 | 30.51 | 0.45 | 48.12 | 0.18 |
According to the construction of the city, the duration of the working condition is about 1200s. According to the time curve, the kinematic segments are selected from the three categories respectively to be 56,89,225, forming the final representative working condition. Fitting the representative working condition time-speed graph and working condition time-acceleration graph, as shown in Figure 3 and Figure 4.
Comparing the characteristic parameters of the collected overall sample data with the characteristic parameters calculated based on traditional k-means, k-means++, and Grid-K-means, as shown in Table 7 below. Among them, E
C
Characteristic parameters | Overall sample data | k-means | k-means++ | Grid-K-means | |||
---|---|---|---|---|---|---|---|
Fitted value | Relative error/% | Fitted value | Relative error/% | Fitted value | Relative error/% | ||
18.91 | 18.11 | 4.23 | 18.34 | 3.01 | 18.60 | 1.64 | |
1.85 | 1.95 | 5.41 | 1.88 | 1.62 | 1.86 | 0.54 | |
-2.36 | -2.45 | 3.81 | -2.43 | 2.97 | -2.39 | 1.27 | |
0.52 | 0.58 | 11.54 | 0.55 | 5.77 | 0.53 | 1.92 | |
0.49 | 0.41 | 16.32 | 0.44 | 10.20 | 0.47 | 4.08 | |
0.26 | 0.31 | 19.23 | 0.29 | 11.54 | 0.28 | 7.69 | |
10.09 | 5.85 | 2.86 |
The average relative error of k-means is 10.09%, the average relative error of k-means++ is 5.85%, and the average relative error of all characteristic parameters of driving conditions constructed based on the improved k-means algorithm (Grid-K-means) is only 2.86%, and the relative error of the other eigenvalue parameters is less than 8%, so it shows that the Grid-K-means method can more accurately represent the driving conditions than the traditional method.
The driving acceleration of the whole sample is divided into three categories [17], (-∞,-0.64) m/s2, [-0.64,0.64] m/s2, (0.64,+∞)m/s2 ,and the sample speed of each type of acceleration is divided into six categories,[0,10) m/s, [10,20) m/s, [20,30) m/s, [30,40) m/s, [40,50) m/s, [50, +∞) m/s. In order to further test the similarity between acceleration distribution of representative working conditions and test data, K-S test of independent samples is carried out, and the test results are shown in Table 8 below. From this, we found that the acceleration distribution of the improved clustering algorithm Grid-K-means fitting working conditions is better than that of k-means++, which can better reflect the distribution frequency of acceleration.
S
Method | Acceleration distribution (m/s2) | |||
---|---|---|---|---|
(-∞,-0.64) | [-0.64,0.64] | (0.64,+∞) | ||
Grid-K-means | K-S value | 0.68 | 0.57 | 0.53 |
Similarity level | 0.89 | 0.96 | 0.99 | |
k-means++ | K-S value | 0.71 | 0.26 | 0.45 |
Similarity level | 0.79 | 0.44 | 0.82 |
In order to effectively construct vehicle driving conditions based on the city’s own data, this paper uses discrete Fourier transform to preprocess the data, eliminate the noise in the data, and use principal component analysis to reduce the dimension of the data, and propose an improved k-means (Grid-K-means) optimization method to cluster the data, replace the initial parameter setting with the idea of dynamic grid division to improve clustering efficiency, speed up data processing, and reduce time complexity. Through experimental analysis: the average relative error between the driving conditions and the characteristic parameters of the experimental data is only 2.86%. The K-S test is used to verify and analyze the similarity between the fitting conditions and the experimental data. The results show that the constructed driving conditions are more accurate and more accurate and can better reflect the overall driving characteristics.