The driving condition of a car is also called the operating cycle, which is the speed-time variation law of a vehicle in a specific environment. It is mainly used to evaluate vehicle pollutant emissions and energy consumption, and is of great value to the research and development of new vehicle models and risk assessment of traffic control [1]. Many scholars have conducted research on it, and Nguyen et al. [2] proposed a driving cycle construction process based on Markov chain theory. Ding Yifeng et al. [3] used multivariate statistical methods such as principal component and cluster analysis to construct automobile road conditions. Liu Yingji et al. [4] used the characteristics of kinematics segment connection fuzzy to construct working conditions by combining principal components and fuzzy C-means clustering. Most scholars' research on driving cycle mainly focuses on the selection of K-means clustering initial center and single improved k-means clustering algorithm, but lack of research on principal component analysis and clustering combination optimization and execution time consumption. In order to achieve the ideal clustering effect and time consumption, it is still necessary to focus on the improvement of K-means clustering. Zhang Rui et al. [5] proposed OICCK-means algorithm in order to make up for the deficiency that the clustering effect of traditional K-means algorithm depends heavily on the initial clustering center. Zhang Lin et al. [6] adopted the idea of density to overcome the sensitive defect of traditional initial center. Luo Junfeng et al. [7] introduced information entropy and weighted distance to remove outliers. Zhang Yan [8] proposed an improved rough K-means clustering algorithm based on density weighting, which not only improves the clustering accuracy and reduces the number of iterations, but also weakens the interference of noise data and outliers on the results. However, the algorithm improves the clustering accuracy at the expense of efficiency cost. The algorithm puts most of the time consumption on the density of data objects, and the time complexity is too high.
Through the above analysis, this paper proposes an improved principal component analysis and improved K-means clustering combination optimization method, introduces the maximum and minimum clustering method and weighted Euclidean distance, and increases the weight of clustering eigenvalues according to the contribution factor. The results show that the clustering effect is stable, the time consumption is low, and the driving cycle constructed has strong applicability and meets the characteristics of traffic conditions.
The data collected in this paper are the actual road driving conditions of a city light vehicle in September 2019 (sampling frequency is 1Hz), among which, the data information includes time, GPS speed measurement, longitude and latitude, instantaneous fuel consumption, etc. Using fitting interpolation method to interpolate and fit the disturbed discontinuous data, wavelet decomposition and reconstruction method to smooth the contaminated data [9] the original data was reduced from 194511 to 164039 by Matlab preprocessing
Based on the analysis of relevant data and related research, 12 characteristic parameters are defined to describe the kinematic segments [10]. In this paper, 12 characteristic parameters including segment duration/
The interval from the start of one idling speed to the beginning of the next idling speed is called the kinematic segment [11]. This paper uses Python to develop related programs, uses stack and loop traversal data for processing, and divides 2445 kinematics segments from 164039 preprocessed data.
The traditional principal component uses linear technology to reduce the dimension of data, which eliminates the influence of order of magnitude and the difference information of each characteristic factor. In real life, the relationship between data is often nonlinear.
The comprehensive evaluation method with variance contribution rate as the weight can not reasonably explain the analysis results, and even the evaluation results deviate greatly from the facts [12]. Therefore, using the specific gravity method proposed in reference [13], the improved principal component can not only eliminate the dimension noise, but also can represent more feature parameter information and realize dimension reduction. The formula is as follows:
In the case of dimension reduction, the improved principal component forms a matrix with the obtained number of data samples (
It can be seen from Figure 2 that each principal component is gradually decreasing, and there is an obvious inflection point in the change curve. It can be seen from Figure 1 that the first principal component contains 41.5% information in the improved principal component analysis results, so it meets the requirement that less principal components represent more information.
When the absolute value of the principal component load factor of the selected parameter is larger, the correlation coefficient between a parameter and a principal component is higher [14]. From Figure 3, we can see the correlation of each eigenvalue directly. According to the above table 1, the first principal component eigenvalues are driving distance, average deceleration and average driving speed, and the correlation coefficients are 3.15, 2.08 and 3.69, respectively, so they have great correlation with driving distance and average driving speed; The second principal component eigenvalues have the average speed and cruise time ratio, and the correlation coefficients are 2.75 and 3.84 respectively, so they have a greater correlation with the cruise time ratio; the third principal component eigenvalues have the idle time ratio and deceleration time ratio, and the correlation coefficients are 3.06 and 2.85 respectively, so they have a greater correlation with the idle time ratio; The fourth principal component eigenvalue has fragment duration, and the correlation coefficient is 2.43, which indicates that it has a strong correlation with fragment duration. Through the analysis of IPCA, the first four principal components can reflect the characteristics of the original segment, and the 12 characteristic parameter matrices of the population sample are compressed into one eight characteristic parameter matrix which can represent the vast majority of sample information.
Principal component loading matrix
Characteristic parameter | ||||
---|---|---|---|---|
Deceleration time ratio |
0.423 | 0.341 | −0.723 | 0.248 |
Distance traveled |
0.893 | 0.134 | 0.045 | 0.432 |
Fragment duration |
0.432 | 0.231 | −0.142 | 0.768 |
Acceleration time ratio |
0.394 | −0.156 | 0.060 | 0.491 |
Cruise time ratio |
0.341 | 0.835 | −0.045 | −0.138 |
Average velocity |
0.499 | 0.763 | 0.025 | 0.255 |
Average driving speed |
0.778 | 0.315 | 0.112 | 0.358 |
Speed standard deviation |
0.198 | 0.033 | 0.034 | 0.189 |
Accelerate standard deviation |
0.145 | 0.267 | −0.067 | −0.121 |
Average acceleration |
0.014 | 0.223 | 0.033 | 0.024 |
Average deceleration |
0.566 | −0.433 | −0.052 | 0.315 |
Idle Time Ratio |
0.125 | −0.351 | 0.843 | 0.467 |
The actual test species will have more or less interference, which often produces outliers or noises, which will affect the clustering effect. Here, we construct a residual point distance mean sum method to eliminate the influence of noise and outliers [15]. For the
Among them,
Among them,
Among them,
Among them,
The initial new weight is as follows:
Among them, the clustering accuracy is
Among them,
If Afinal > Ainit, accept the new weight and set Ainit =Afinal ; otherwise, keep the old weight unchanged.
According to the above working condition data, the improved K-means algorithm is used for processing. First, edge data and outliers are detected, and abnormal points are eliminated. As shown in Figure 4 below, cluster 1 is a normal clustered point. Cluster 2 is the outlier of edge data. As can be seen in Figure 5, the edge data is relatively distant from most normal points, and most of the edge data are outliers, which can be eliminated.
According to the above-mentioned improved principal component analysis, the contribution factor and the characteristic value with high correlation are used to draw the three-dimensional graph, as shown in Figure 6. In this paper, the average speed, driving distance and cruise time ratio are selected to represent each point of clustering.
The improved K-Means clustering algorithm divides the kinematic segments into four categories, which are represented by cluster 1, cluster 2, cluster 3 and Cluster 4. It can be seen from Figure 7 that the first type is downtown area, where the vehicles start and stop frequently and the speed is low, and the average speed, cruise time ratio and driving distance are low; the second type is the living area, which is congested, with more start and stop times, and lower average speed, cruise time ratio and driving distance; the third type is suburban area, with smooth road conditions, less starting and stopping times, average speed, cruise time ratio and driving distance The fourth type is high-speed area, with smooth traffic, less start and stop times, high average speed, cruise time ratio and driving distance.
According to the proportion of the total time of various time segments in the driving cycle of all data sets, the time taken by each driving cycle in the final construction cycle can be calculated [16]. This paper takes 1400s to construct vehicle driving cycle, as shown in Figure 8 below. The first type of low speed segment, the second type of medium speed segment, and the third type of medium high speed segment. The fourth type of high-speed video.
From the speed and acceleration to verify the difference between the constructed driving cycle and the experimental data [11], this is a relatively standard verification method. Matlab software is used to calculate the speed acceleration joint distribution matrix of the vehicle driving cycle data, as shown in Figure 9.
As can be seen from Figure 9 above, the joint velocity acceleration difference distribution of the experimental data and the improved clustering algorithm in this paper is within the ±1.2% range, and the calculated distribution difference value (SAFD
This paper uses the working condition construction method of literature [17] and literature [18]. According to the data of this paper, the improved principal component algorithm of this paper is combined with the four algorithms respectively, and 20 experiments are performed, as shown in Figure 10. The results show that, In this paper, the improved K-means clustering algorithm can not only weaken the influence of noise points on the initial center, but also greatly shorten the clustering time based on the stable clustering effect.
The results of programming using Matlab are shown in Table 2 above. The comparison of the four algorithms in terms of the number of error-clustering samples, average running time, average correct rate and SAFD
Four methods to compare the results of the experiment
Clustering method | The number of wrong samples | Average running time / s | Average accuracy /% | SAFD |
---|---|---|---|---|
k-means | 184 | 260.5 | 89 | 1.98 |
Literature[17] | 121 | 202.75 | 97 | 1.54 |
Literature[18] | 98 | 181.5 | 99 | 1.25 |
The algorithm in this paper | 101 | 145.25 | 98 | 1.05 |
As shown in figures 11 and 12, the instantaneous fuel consumption is large at low speed, medium and low speed, the torque fluctuation in the region is larger than that in the high speed region, the instantaneous fuel consumption rate in the high speed region is relatively stable, and the instantaneous fuel consumption rate in the low speed region and medium speed region is obviously increased. It can be observed in Figure 13 that the instantaneous fuel consumption increases briefly at low speed, and then the fluctuation trend is roughly consistent with the driving speed. As can be seen from Figure 14, the engine speed is mainly distributed in 1500–2500r / min under driving condition, and the opening of accelerator pedal is concentrated in 0.12–0.18, indicating that the driving condition is in medium high speed state.
It can be observed from Figure 15 above that the instantaneous fuel consumption is mostly concentrated in the speed of 1000–1500r / min, and the percentage of torque is 10% – 30%, which indicates that this part is composed of high-speed, medium speed and low-speed driving conditions. There are a few relatively concentrated areas in the speed range of 1500–2500r / min. it can be observed that this part is the instantaneous fuel consumption generated under the condition of high engine speed and low torque percentage, which may be due to driving It is caused by the extreme operation of the driver.
This paper proposes an improved optimization algorithm for the combination of principal components and feature-weighted K-means clustering, and introduces the residual point clustering mean method to eliminate outliers and reduce clustering time. The maximum minimum distance method can optimize the candidate initial centers, so that K-means avoids falling into the local optimal solution, so as to achieve a good clustering effect. According to the contribution rate of the eigenvalue contribution factor to the cluster, the initial feature weight is obtained, and a weighted Euclidean distance metric is proposed. Select characteristic values such as cruise time ratio, travel distance, average speed and so on with larger contribution factors, and then increase the weight to perform cluster analysis to construct vehicle driving conditions. The improved clustering algorithm proposed in this paper still has room for improvement. The weighted density K-means clustering algorithm can be proposed on the basis of the algorithm in this paper. You can also consider directly removing outliers in the data preprocessing part of this paper to reduce the running time of subsequent clustering. You can also add more dimensional feature information.