Moving object tracking technology has nearly 20 years of research history, the research content of this technology mainly involves moving object detection and extraction, moving object tracking, moving object recognition, moving object behavior analysis and solution and many other aspects, is an important branch of computer vision research. During this period, with the development of computer communication technology, computing technology and information technology, as well as the continuous updating of computer hardware including image processing hardware, moving target tracking technology has also been rapidly developed, and the research results of related technologies have been widely used.
Basic theories such as recognition and artificial intelligence are in turn the focus of these theoretical research. Moving target tracking technology was initially applied in video surveillance system, and then it has been widely developed and applied in practical fields such as national defense, intelligent robot, intelligent traffic command and so on.
Moving target tracking as the digital image processing and computer vision a hot research field in the human production and life of great economic benefit and application value, has been attracted a large number of scholars to research, this paper selected topic in the application background of intelligent video surveillance, focused on analysis of the dynamic tracking targets in the scene change shade and multiscale problems.
The main research content is to realize tracking of moving target in dynamic background. By extracting the histogram of direction gradient and histogram of Lab color space in target information features, the features of the two are fused. Mainly involved in the process of texture feature extraction of target, the target tracking algorithm implementation and tracking the results of the training classifier and is used to determine the target location of several core modules, designed to take advantage of tracking algorithms have been tracking results, and then give a complete target mobile state information, determine the goal of tracking, complete experimental analysis.
There are two kinds of target tracking algorithms: generative algorithm and discriminant algorithm. The kernel correlator algorithm belongs to the latter, which is similar to the principle of discriminant algorithm. In its realization principle, classifiers are replaced by filters to judge the position of the tracking target. The positive-negative Sample method is still used in the training of classifier, that is, the training method of Positive and Negative samples. The target region is generally regarded as a positive sample and the area around the target as a negative sample. The closer the location in the image is to the target, the more likely the area around it is to be a positive sample.
In the next frame of the video sequence, an area with the size of M * N is extracted from the position where the target appears in the previous frame as the sample for cosine weighting, and then the HOG feature map and each dimension is denoted as Z1, Z2, Z3, Z4… ZN is taken as the sample input, and the kernel matrix element Kz of the test sample and training sample in the kernel space is obtained by using formula (2). Then the elements together form the kernel matrix Kz, (3), and K xz is obtained, which is the first row of the matrix Kz transpose matrix.
The algorithm designed in this paper is mainly divided into two modules, the tracker target detection and initialization module and the tracker model update module. The detailed realization principle of the algorithm and the intermediate results in the realization process are shown in Figure 1.
The first module is the main process by video each frame and calibration in each frame to track the target track box information to extract the feature information of the target image fusion (HOG + Lab color features) and choose target in a box near the center of the sampling, a model is obtained by training, this model can calculate the response value of pixels in each position the image. At the same time, the kalman filter is initialized when the first frame of the video is played, and the motion information of the target to be tracked is counted from the first frame of the video, so that the location of the target to be obscured can be determined by using the predicted target position information when the occlusion of the tracking target occurs.
The main process of the second module is to judge whether the position of the tracking frame of the previous frame is abnormal, such as out of bounds, when the video sequence is about to play the next frame. If so, adjust the position of the tracking rectangle. At the same time detection originally target tracking box size (scale) is changed, if there is no change, in the previous frame samples near the target, operated by the trained model related to the image, save the response value of each sample point pixel, the highest response values and meet the threshold condition of a given location is the center of the target at this time. Then calculate the total peak value of the target location tracking box through a specific function. Then originally set scale step length, here have the effect of multiple, size of rectangular box and scale values to zoom in and out, get two dimensions, one large and one small calculation of total peak area two scales respectively, then and started to get before the peak size comparison, select out the peak has the largest scale, Finally, the position of the target tracking frame of this frame is adjusted by the scale and the target center position obtained previously, and then the model is trained and updated for relevant operations between the sampling position and the image in the next frame.
The core of HOG feature is to construct the contour information of the tracking target through histogram on the computationally intensive local range. For image I (x, y), the realization idea of the algorithm in this paper to extract the HOG feature of the tracking target is as follows:
In this paper, PCA-HOG features with dimensionality reduction through principal component analysis (PCA) are used to extract target features. If the resolution of the image in the target region is 100*100, then the target image has a total of 10,000 feature vectors, which is a great burden to the algorithm processing. Therefore, PCA dimensionality reduction method is adopted to minimize the number of feature vectors representing target features and improve the performance of the algorithm.
Lab Color model is a color model published by CIE (International Commission on Illumination) in 1976. One of the characteristics of this model is that it has nothing to do with the color display ability of the device image. The appearance of Lab just makes up for the deficiency that RGB and CMYK color models previously appeared must depend on the color characteristics of the device. In addition, compared with RGB and CMYK models, one of the biggest characteristics of Lab model is that the color range it represents is the widest, so RGB and CMYK models can be converted to Lab model.
The Lab color space is shown in Figure 1, with three channels L, A and B in total. Luminance of CHANNEL L changes from green to red in the direction from negative to positive in channel A, and from blue to yellow in the direction from negative to positive in channel B.
Because Lab model has nothing to do with the color display ability of equipment, it has a lot of applications in color image retrieval. In addition, if you want to retain a wide range of color gamut and full color after image processing, the image can be processed using Lab color model, and finally converted into RGB model (for display) or CMYK model (for printing) for output according to demand. This processing method can make the image after processing can try to use more rich and high-quality color output.
To sum up, as a feature, the Lab color model has high sensitivity to color, but low sensitivity to deformation and motion blur, which can be applied to tracking targets when deformation and motion occur.
The kernel correlation filtering algorithm uses Fourier transform, which is often used in time domain and frequency domain conversion. When measuring and calculating digital signals (including images) of infinite length, it is not possible to analyze the whole signal, but rather the limited content extracted from the signal. In the process of signal interception, if the current signal is a one-dimensional sinusoidal signal, if the period of the interception part of the signal is not a positive multiple of the period of the sinusoidal function, then the intercepted signal will be discontinuous on the signal graph, so this interception is called aperiodic interception. It will cause the spectrum of the signal to leak, which is mainly manifested as “trailing” phenomenon in the spectrum. This leakage can be reduced by adding window functions. In order to illustrate the phenomenon of spectrum leakage and the effect of window function to reduce spectrum leakage more vividly, the spectrum diagram of Cameraman images in the digital image standard test set is used to demonstrate.
As shown in Figure 2, the image on the right is the spectrum of the image on the left. As can be seen from the image, there is a thin white line along the vertical axis of the spectrum, which is actually caused by spectrum leakage. As mentioned earlier, when limited interception is performed for digital signal analysis, spectrum leakage will occur if the interception is aperiodic. The mood Conditions are generalized to two-dimensional signals, when using Fourier transform x direction of target sampling sample cycle continuation and continuation, y direction cycle from the point of figure 3, vehicle traffic on the roof of the sky is bright, but driving wheels of the road is darker, the corresponding spectrum produced gray mutation in the y direction, The white line in the y direction appears on the spectrum. Observing the longitudinal edge, the gray level change is not very large, so the white line in the X direction is not very obvious, which also indicates that the spectrum leakage in the X direction is not very obvious. But in general, the presence of white lines represents a leak in the image spectrum.
To solve this problem, this demo uses a cosine window (a window function) to solve the problem of spectrum leakage. The effect of the image with the cosine window is shown in Figure 4, and the change of spectrum is shown in Figure 5. After the cosine window is added to the image, the phenomenon of spectrum leakage along the Y-axis is not so obvious (the white line becomes lighter), and the energy in the spectrum diagram is more concentrated and clear than before. However, it should be emphasized here that the window function can only reduce the spectrum leakage and will not make the spectrum leakage disappear.
To sum up, the function of cosine window is to enhance the judgment of sample collection in the training model of KCF algorithm, and avoid the phenomenon of training wrong models due to the inaccurate samples collected by cyclic shift.
Scale space theory: observing the form of any object in daily life, there is a certain measurement standard, this measurement standard is called scale. The measurement of the scale of an object can be understood from many aspects. The weight of an object can be described in terms of “kilogram”, “ton” and “kilogram”, and the size of an object can be described in terms of “meter”, “kilometer”, “millimeter” or even “nanometer”. When analyzing the unknown scene with machine vision technology, the computer does not know the appropriate scale of the target to be tracked in the image in advance. Therefore, it is necessary to consider the use of appropriate multi-scale image description method (that is, image description at all scales) to obtain the best scale of the object of interest. The invariance of scale should be maintained while obtaining the optimal scale, which means that at different scales, the tracked targets all have the same characteristic key points, so the same key points can be detected for matching for image input of different scales.
In the experiment, the OTB-100 dataset was used, which is a set of image sequences specifically used for target tracking algorithms. The entire dataset consists of 100 folders. Each folder contains a video frame sequence set and a tracking target actual position reference text groundtruth_rect. TXT which records the location of the target actual tracking frame when the video frame sequence is played.
In the experiment of the algorithm in this paper, 46 video frame sequences are extracted from the data set to test the algorithm in this paper. These video frame sequences contain various challenges in target tracking, such as scale change, target occlusion, background interference and so on. The programming language of the algorithm in this paper is C++, which is implemented on Visual Studio 2017 software integrating Opencv3.4.1 and opencv-contrib development libraries. At first, the algorithm needs to read the actual tracking frame position determined by the target in the first frame from the reference document of the target position in the video sequence, and then depends on the algorithm to determine the tracking frame position of the target in the video frame. The threshold of occlusion judgment was initially set as 0.25, and the experimental platform was windows10 operating system with Intel i5-6200u 2.40ghz processor. It has 8GB of ram and 2GB of separate video memory. A C++ program generated by static Release compilation can run at a maximum frame rate of 94fps.
The above 46 video frame sequences were used to test the algorithm in this paper and the tracking algorithm of the same type as the algorithm in this paper. The tracking effect and performance curve generated by the algorithm were qualitatively and quantitatively analyzed by Visual Tracker Benchmark. It is realized by matlab programming language. In order to integrate the algorithm into the benchmark tool, the executable file of the algorithm is generated by using the static compilation mode of Release under Visual Studio tools, and then copied to the specified folder. Use MATLAB language to write call interface file, using DOS command to call the executable file, other algorithm implementation program files are also added to the specified folder. Set the video sequence and tracking algorithm to be traced in the configuration file. After that, the rectangle-frame tracking and performance curve drawing functions are executed to describe the rectangle-frame tracking effects of different algorithms and the tracking performance curves of different algorithms.
Due to the large number of video sequences used for testing, there are 46 sequences in total, so four of the most challenging tracking sequences are selected, namely singer1, Singer2, blurFace and Freeman4, which respectively contain deformation, lighting change, blur, fast movement and chaotic background. In the case of out-of-plane rotation and other tracking challenges, the tracking effects of selected and compared tracking algorithms on these 10 sequences are analyzed and compared. At the same time, six filtering tracking algorithms are selected and compared with MulitpleKCF algorithm in this paper, which are CXT, KCF, Struck, CSK, DSST and STC algorithms.
As shown in Table 1, different tracking challenges in 4 video sequences are presented. If there are corresponding challenge attributes, they are marked as √, and if there are not, they are marked as ×. The tracking effects drawn by this algorithm and other algorithms using Visual Tracker Benchmark tool in these 5 video sequences are shown and analyzed below
Challenges in different video sequences
Challenge | deformation | Illumination | Motion blur | Fast moving | Background clutter | Out of plane | In-plane rotation |
---|---|---|---|---|---|---|---|
video | |||||||
freeman4 | × | × | × | × | × | √ | √ |
singer1 | × | √ | × | × | × | √ | × |
singer2 | √ | √ | × | × | √ | √ | √ |
blurFace | × | × | √ | √ | × | × | √ |
Accuracy chart: The center position error is one of the evaluation criteria to measure the accuracy of the tracking algorithm. It refers to the Euclidean distance between the real position of the target calibrated manually and the center position of the rectangular frame obtained by the target tracking algorithm. For each frame used to track the video sequence, assume that the center coordinate of the position rectangular frame calibrated manually is (x, y), and the center target of the target position rectangular frame determined by the tracker is (z, W), for this frame
The center position of is calculated by formula (6)
Success rate chart: Another algorithm standard to measure the accuracy of the algorithm is the boundary box overlap rate. The calculation formula of the boundary box overlap rate S is shown in (7):
OPE a traditional tracking algorithm evaluation method, is adopted in this paper. The accuracy and accuracy curves are drawn for the rectangular box position results of 46 video sequences selected by different algorithms after tracking, and the overall accuracy maps and accuracy maps of different tracking algorithms are obtained.
By observing and comparing the above curves of performance of different algorithms under different tracking challenges. Performance curves can be analyzed from two perspectives: First, whether it is the success rate or accuracy curve graph of the overall video sequence or individual challenges, it can be seen from the meaning of the horizontal and vertical coordinate parameters and the performance curve legend corresponding to the tracking algorithm: As the abscissa threshold changes, the curve positions of the proposed algorithm are all above those of other filtering algorithms, and its tracking accuracy and success rate are better than those of other algorithms.
In addition, for each success rate graph or curve, in the upper right corner, in addition to the legend corresponding to the algorithm to represent the ranking, there are also their corresponding success rate value or accuracy value, the calculation method has been mentioned in the beginning of this section, so it will not be repeated here. According to the statistics of all the success charts, in different tracking challenges, the success rate of the first ranking tracking algorithm (the algorithm in this paper) is generally 1%–30% higher than that of the second ranking tracking algorithm. In some tracking challenges, such as in-plane rotation, occlusion, Scale variation even reaches 34.2%, 42%, and 62.4% respectively; For all the accuracy graphs, the tracking algorithm ranked first was generally 1%–20% more accurate than the tracking algorithm ranked second under different tracking challenges.
From the above two points of view to analyze the curves of various challenges, it can be concluded that the algorithm proposed in this paper is better than other filtering algorithms compared with it in the success rate and accuracy of two performance indicators.