INFORMAZIONI SU QUESTO ARTICOLO

Cita

Research background and significance

Moving object tracking technology has nearly 20 years of research history, the research content of this technology mainly involves moving object detection and extraction, moving object tracking, moving object recognition, moving object behavior analysis and solution and many other aspects, is an important branch of computer vision research. During this period, with the development of computer communication technology, computing technology and information technology, as well as the continuous updating of computer hardware including image processing hardware, moving target tracking technology has also been rapidly developed, and the research results of related technologies have been widely used.

Basic theories such as recognition and artificial intelligence are in turn the focus of these theoretical research. Moving target tracking technology was initially applied in video surveillance system, and then it has been widely developed and applied in practical fields such as national defense, intelligent robot, intelligent traffic command and so on.

Moving target tracking as the digital image processing and computer vision a hot research field in the human production and life of great economic benefit and application value, has been attracted a large number of scholars to research, this paper selected topic in the application background of intelligent video surveillance, focused on analysis of the dynamic tracking targets in the scene change shade and multiscale problems.

The main research content is to realize tracking of moving target in dynamic background. By extracting the histogram of direction gradient and histogram of Lab color space in target information features, the features of the two are fused. Mainly involved in the process of texture feature extraction of target, the target tracking algorithm implementation and tracking the results of the training classifier and is used to determine the target location of several core modules, designed to take advantage of tracking algorithms have been tracking results, and then give a complete target mobile state information, determine the goal of tracking, complete experimental analysis.

Algorithm of kernel correlation filter

There are two kinds of target tracking algorithms: generative algorithm and discriminant algorithm. The kernel correlator algorithm belongs to the latter, which is similar to the principle of discriminant algorithm. In its realization principle, classifiers are replaced by filters to judge the position of the tracking target. The positive-negative Sample method is still used in the training of classifier, that is, the training method of Positive and Negative samples. The target region is generally regarded as a positive sample and the area around the target as a negative sample. The closer the location in the image is to the target, the more likely the area around it is to be a positive sample.

Characteristics of kernel correlation filter algorithm

1) The circular matrix is cyclically shifted in the area around the tracking target, positive and negative samples are collected, and the ridge regression method is used to train the filter. Since the cyclic matrix can be diagonalized by the Fourier transform matrix, the method of obtaining the cyclic matrix is simplified by the Hadamard product of the vector (the dot product calculation of the elements in the vector), and at the same time the speed and real-time performance of the algorithm is improved by the simplification of the operation.

2) The kernel function can be used to map ridge regression in vector space to non-vector space. If we want to solve the duality problem or conventional constraints in non-vector space, we can also simplify the operation process by using the property that the cyclic matrix can be diagonalized by Fourier transform matrix.

3) Compared with single-channel features (such as grayscale features), kernel correlation filter can adopt features with multiple gradients and histogram channels such as HOG, or color features with three channels such as RGB and HSV, which provides a method for the algorithm in this section to integrate multi-channel features. Two mathematical models are used in kernel correlation filter algorithm.

Implementation process of kernel correlation filter algorithm

1) Select the position of the target to be tracked in the first frame of the video sequence, and the selected rectangular frame area will be enlarged by 2.5 times, and the size of the area will be recorded as M * N.

2) Perform cosine weighting operation on the rectangular box of the framed target position, and then calculate the HOG feature operator to obtain the HOG feature graph of n-dimension. Each dimension that makes up the feature is treated as a sample with an input of size m*n, and the sample is denoted x1, x2, x3… XN.

3) Two-dimensional Gaussian function is used to generate label matrix Y, whose size is consistent with sample M *N.

4) Use the nonlinear regression mode (1) mentioned above to calculate k xx, HOG feature is used as input, and a non-linear regressor is trained as shown in (1): α=ykxx+λ \alpha = {y \over {{k^{xx}} + \lambda }}

In the next frame of the video sequence, an area with the size of M * N is extracted from the position where the target appears in the previous frame as the sample for cosine weighting, and then the HOG feature map and each dimension is denoted as Z1, Z2, Z3, Z4… ZN is taken as the sample input, and the kernel matrix element Kz of the test sample and training sample in the kernel space is obtained by using formula (2). Then the elements together form the kernel matrix Kz, (3), and K xz is obtained, which is the first row of the matrix Kz transpose matrix. kijz=ϕ(zi)ϕ(xj) k_{ij}^z = \phi \left( {{z_i}} \right)\phi \left( {{x_j}} \right) k2=ϕ(x)ϕ(z)T {k^2} = \phi \left( x \right)\phi {\left( z \right)^T}

5) The following formula (4) is used to obtain the response matrix under the Fourier domain, and then the response matrix F (z) is obtained through the inverse Fourier transform, as shown in Formula (4) : f(Z)=kxza f\left( Z \right) = {k^{xz}} \odot a

6) The position of the maximum response value is found in the response matrix F (z). If the response value of one position in the matrix is greater than the response value of all other positions in the response matrix, then this position is the position of the target in the current frame. However, if the situation is reversed, you can take additional steps (such as re-performing a global search match) to retrieve a target region, and then start again from step (1).

7) The following steps are mainly to update the model, extract samples from the newly found target location, and then repeat the four operations from Step 2 to Step 5. For the current frame, the model is denoted as, then the model used in the next frame can be obtained by interpolation operation between the model of the current frame and the model calculated at the beginning, as shown in Formula (5) : anew=mαola+(1m)α {a_{new}} = m{\alpha _{ola}} + \left( {1 - m} \right)\alpha^\prime

8) HERE, M represents the learning rate, and the value is between 0 and 1, indicating the influence degree of the previous model on the current model. If the value is too large or too small, tracking failure will be caused.

Multi-scale anti-occlusion target tracking algorithm based on kernel correlation filter

The algorithm designed in this paper is mainly divided into two modules, the tracker target detection and initialization module and the tracker model update module. The detailed realization principle of the algorithm and the intermediate results in the realization process are shown in Figure 1.

Figure 1.

Lab model

The first module is the main process by video each frame and calibration in each frame to track the target track box information to extract the feature information of the target image fusion (HOG + Lab color features) and choose target in a box near the center of the sampling, a model is obtained by training, this model can calculate the response value of pixels in each position the image. At the same time, the kalman filter is initialized when the first frame of the video is played, and the motion information of the target to be tracked is counted from the first frame of the video, so that the location of the target to be obscured can be determined by using the predicted target position information when the occlusion of the tracking target occurs.

The main process of the second module is to judge whether the position of the tracking frame of the previous frame is abnormal, such as out of bounds, when the video sequence is about to play the next frame. If so, adjust the position of the tracking rectangle. At the same time detection originally target tracking box size (scale) is changed, if there is no change, in the previous frame samples near the target, operated by the trained model related to the image, save the response value of each sample point pixel, the highest response values and meet the threshold condition of a given location is the center of the target at this time. Then calculate the total peak value of the target location tracking box through a specific function. Then originally set scale step length, here have the effect of multiple, size of rectangular box and scale values to zoom in and out, get two dimensions, one large and one small calculation of total peak area two scales respectively, then and started to get before the peak size comparison, select out the peak has the largest scale, Finally, the position of the target tracking frame of this frame is adjusted by the scale and the target center position obtained previously, and then the model is trained and updated for relevant operations between the sampling position and the image in the next frame.

The steps of HOG feature extraction are as follows

The core of HOG feature is to construct the contour information of the tracking target through histogram on the computationally intensive local range. For image I (x, y), the realization idea of the algorithm in this paper to extract the HOG feature of the tracking target is as follows:

1) In order to reduce the influence of illumination mutation, environmental change and other factors, image information needs to be normalized, such as the normalized Gamma space and color space. Proper compression of images can effectively cope with environmental changes such as illumination and partial area shadow. In addition, in order to improve the real-time performance of the algorithm in this paper, color images are generally transformed into gray images for processing.

2) Calculate the first step degree of the image. The first-order derivative operation can not only weaken the ray mutation interference, but also capture the contour information of the tracking target and the texture information of the image. The gradient direction corresponding to the gradient is calculated as shown in Formula.

3) Perform gradient projection on cells. This reduces sensitivity to changes in the appearance of the tracking target. This step extracts information from the local image. Before obtaining the final feature, the image needs to be divided into several cells, and then the histogram of direction gradient of pixels in each cell is counted and accumulated, so as to obtain the final feature information through mapping.

4) Normalize all cells in the block. Usually the cells in the image are shared over multiple different blocks, and because the normalization operation is performed on different blocks, the results of each operation will be different. The interference factor can be further compressed by normalizing the cell.

5) Collect HOG features. HOG feature is extracted from all overlapped blocks in the image to be recognized, and they are combined into the final feature vector for recognition.

In this paper, PCA-HOG features with dimensionality reduction through principal component analysis (PCA) are used to extract target features. If the resolution of the image in the target region is 100*100, then the target image has a total of 10,000 feature vectors, which is a great burden to the algorithm processing. Therefore, PCA dimensionality reduction method is adopted to minimize the number of feature vectors representing target features and improve the performance of the algorithm.

Lab color model

Lab Color model is a color model published by CIE (International Commission on Illumination) in 1976. One of the characteristics of this model is that it has nothing to do with the color display ability of the device image. The appearance of Lab just makes up for the deficiency that RGB and CMYK color models previously appeared must depend on the color characteristics of the device. In addition, compared with RGB and CMYK models, one of the biggest characteristics of Lab model is that the color range it represents is the widest, so RGB and CMYK models can be converted to Lab model.

The Lab color space is shown in Figure 1, with three channels L, A and B in total. Luminance of CHANNEL L changes from green to red in the direction from negative to positive in channel A, and from blue to yellow in the direction from negative to positive in channel B.

Because Lab model has nothing to do with the color display ability of equipment, it has a lot of applications in color image retrieval. In addition, if you want to retain a wide range of color gamut and full color after image processing, the image can be processed using Lab color model, and finally converted into RGB model (for display) or CMYK model (for printing) for output according to demand. This processing method can make the image after processing can try to use more rich and high-quality color output.

To sum up, as a feature, the Lab color model has high sensitivity to color, but low sensitivity to deformation and motion blur, which can be applied to tracking targets when deformation and motion occur.

Hanning window and the method of combining the above two characteristics

The kernel correlation filtering algorithm uses Fourier transform, which is often used in time domain and frequency domain conversion. When measuring and calculating digital signals (including images) of infinite length, it is not possible to analyze the whole signal, but rather the limited content extracted from the signal. In the process of signal interception, if the current signal is a one-dimensional sinusoidal signal, if the period of the interception part of the signal is not a positive multiple of the period of the sinusoidal function, then the intercepted signal will be discontinuous on the signal graph, so this interception is called aperiodic interception. It will cause the spectrum of the signal to leak, which is mainly manifested as “trailing” phenomenon in the spectrum. This leakage can be reduced by adding window functions. In order to illustrate the phenomenon of spectrum leakage and the effect of window function to reduce spectrum leakage more vividly, the spectrum diagram of Cameraman images in the digital image standard test set is used to demonstrate.

As shown in Figure 2, the image on the right is the spectrum of the image on the left. As can be seen from the image, there is a thin white line along the vertical axis of the spectrum, which is actually caused by spectrum leakage. As mentioned earlier, when limited interception is performed for digital signal analysis, spectrum leakage will occur if the interception is aperiodic. The mood Conditions are generalized to two-dimensional signals, when using Fourier transform x direction of target sampling sample cycle continuation and continuation, y direction cycle from the point of figure 3, vehicle traffic on the roof of the sky is bright, but driving wheels of the road is darker, the corresponding spectrum produced gray mutation in the y direction, The white line in the y direction appears on the spectrum. Observing the longitudinal edge, the gray level change is not very large, so the white line in the X direction is not very obvious, which also indicates that the spectrum leakage in the X direction is not very obvious. But in general, the presence of white lines represents a leak in the image spectrum.

Figure 2.

Car2 image and other spectra

Figure 3.

Results and spectra of transverse and longitudinal cyclic assignments of Fourier transforms

To solve this problem, this demo uses a cosine window (a window function) to solve the problem of spectrum leakage. The effect of the image with the cosine window is shown in Figure 4, and the change of spectrum is shown in Figure 5. After the cosine window is added to the image, the phenomenon of spectrum leakage along the Y-axis is not so obvious (the white line becomes lighter), and the energy in the spectrum diagram is more concentrated and clear than before. However, it should be emphasized here that the window function can only reduce the spectrum leakage and will not make the spectrum leakage disappear.

Figure 4.

Display effect of Car2 image with cosine window

Figure 5.

Image spectrum changes after Car2 image plus window

To sum up, the function of cosine window is to enhance the judgment of sample collection in the training model of KCF algorithm, and avoid the phenomenon of training wrong models due to the inaccurate samples collected by cyclic shift.

Multiscale introduction and application in this algorithm

Scale space theory: observing the form of any object in daily life, there is a certain measurement standard, this measurement standard is called scale. The measurement of the scale of an object can be understood from many aspects. The weight of an object can be described in terms of “kilogram”, “ton” and “kilogram”, and the size of an object can be described in terms of “meter”, “kilometer”, “millimeter” or even “nanometer”. When analyzing the unknown scene with machine vision technology, the computer does not know the appropriate scale of the target to be tracked in the image in advance. Therefore, it is necessary to consider the use of appropriate multi-scale image description method (that is, image description at all scales) to obtain the best scale of the object of interest. The invariance of scale should be maintained while obtaining the optimal scale, which means that at different scales, the tracked targets all have the same characteristic key points, so the same key points can be detected for matching for image input of different scales.

Experimental results and analysis
Experimental data set and development environment Settings

In the experiment, the OTB-100 dataset was used, which is a set of image sequences specifically used for target tracking algorithms. The entire dataset consists of 100 folders. Each folder contains a video frame sequence set and a tracking target actual position reference text groundtruth_rect. TXT which records the location of the target actual tracking frame when the video frame sequence is played.

In the experiment of the algorithm in this paper, 46 video frame sequences are extracted from the data set to test the algorithm in this paper. These video frame sequences contain various challenges in target tracking, such as scale change, target occlusion, background interference and so on. The programming language of the algorithm in this paper is C++, which is implemented on Visual Studio 2017 software integrating Opencv3.4.1 and opencv-contrib development libraries. At first, the algorithm needs to read the actual tracking frame position determined by the target in the first frame from the reference document of the target position in the video sequence, and then depends on the algorithm to determine the tracking frame position of the target in the video frame. The threshold of occlusion judgment was initially set as 0.25, and the experimental platform was windows10 operating system with Intel i5-6200u 2.40ghz processor. It has 8GB of ram and 2GB of separate video memory. A C++ program generated by static Release compilation can run at a maximum frame rate of 94fps.

The above 46 video frame sequences were used to test the algorithm in this paper and the tracking algorithm of the same type as the algorithm in this paper. The tracking effect and performance curve generated by the algorithm were qualitatively and quantitatively analyzed by Visual Tracker Benchmark. It is realized by matlab programming language. In order to integrate the algorithm into the benchmark tool, the executable file of the algorithm is generated by using the static compilation mode of Release under Visual Studio tools, and then copied to the specified folder. Use MATLAB language to write call interface file, using DOS command to call the executable file, other algorithm implementation program files are also added to the specified folder. Set the video sequence and tracking algorithm to be traced in the configuration file. After that, the rectangle-frame tracking and performance curve drawing functions are executed to describe the rectangle-frame tracking effects of different algorithms and the tracking performance curves of different algorithms.

Qualitative comparison with other tracking algorithms

Due to the large number of video sequences used for testing, there are 46 sequences in total, so four of the most challenging tracking sequences are selected, namely singer1, Singer2, blurFace and Freeman4, which respectively contain deformation, lighting change, blur, fast movement and chaotic background. In the case of out-of-plane rotation and other tracking challenges, the tracking effects of selected and compared tracking algorithms on these 10 sequences are analyzed and compared. At the same time, six filtering tracking algorithms are selected and compared with MulitpleKCF algorithm in this paper, which are CXT, KCF, Struck, CSK, DSST and STC algorithms.

As shown in Table 1, different tracking challenges in 4 video sequences are presented. If there are corresponding challenge attributes, they are marked as √, and if there are not, they are marked as ×. The tracking effects drawn by this algorithm and other algorithms using Visual Tracker Benchmark tool in these 5 video sequences are shown and analyzed below

Challenges in different video sequences

Challenge deformation Illumination Motion blur Fast moving Background clutter Out of plane In-plane rotation
video
freeman4 × × × × ×
singer1 × × × × ×
singer2 × ×
blurFace × × × ×

Figure 6.

Tracking effect of target in Freeman4 sequence

Figure 7.

Tracking effect of target in singer1 sequence

Figure 8.

Tracking effect of target in singer2 sequence

Figure 9.

Target tracking effect in the blurFace video sequence

The accuracy and success rate of the tracking algorithm are evaluated

Accuracy chart: The center position error is one of the evaluation criteria to measure the accuracy of the tracking algorithm. It refers to the Euclidean distance between the real position of the target calibrated manually and the center position of the rectangular frame obtained by the target tracking algorithm. For each frame used to track the video sequence, assume that the center coordinate of the position rectangular frame calibrated manually is (x, y), and the center target of the target position rectangular frame determined by the tracker is (z, W), for this frame

The center position of is calculated by formula (6) ɛ=(xy)2+(zw)2 \varepsilon = \sqrt {{{\left( {x - y} \right)}^2} + {{\left( {z - w} \right)}^2}}

Success rate chart: Another algorithm standard to measure the accuracy of the algorithm is the boundary box overlap rate. The calculation formula of the boundary box overlap rate S is shown in (7): S=|RGRT||RGRT| S = {{\left| {{R_G} \cap {R_T}} \right|} \over {\left| {{R_G} \cup {R_T}} \right|}}

OPE a traditional tracking algorithm evaluation method, is adopted in this paper. The accuracy and accuracy curves are drawn for the rectangular box position results of 46 video sequences selected by different algorithms after tracking, and the overall accuracy maps and accuracy maps of different tracking algorithms are obtained.

Figure 10.

Shows the success rate and accuracy of all video sequences evaluated using OPE

Figure 11.

Success rate and accuracy of OPE evaluation under target deformation

Figure 12.

Success rate and accuracy of OPE evaluation under varying lighting conditions

Figure 13.

Success rate and accuracy of OPE evaluation in motion blur case

Figure 14.

Graph of success rate and accuracy evaluated using OPE in the case of fast movement

Figure 15.

Success rate and accuracy of OPE evaluation in cluttered background

Figure 16.

Success rate and accuracy of OPE evaluation for out-of-plane rotation

Figure 17.

Success rate and accuracy of OPE evaluation for in-plane rotation

By observing and comparing the above curves of performance of different algorithms under different tracking challenges. Performance curves can be analyzed from two perspectives: First, whether it is the success rate or accuracy curve graph of the overall video sequence or individual challenges, it can be seen from the meaning of the horizontal and vertical coordinate parameters and the performance curve legend corresponding to the tracking algorithm: As the abscissa threshold changes, the curve positions of the proposed algorithm are all above those of other filtering algorithms, and its tracking accuracy and success rate are better than those of other algorithms.

In addition, for each success rate graph or curve, in the upper right corner, in addition to the legend corresponding to the algorithm to represent the ranking, there are also their corresponding success rate value or accuracy value, the calculation method has been mentioned in the beginning of this section, so it will not be repeated here. According to the statistics of all the success charts, in different tracking challenges, the success rate of the first ranking tracking algorithm (the algorithm in this paper) is generally 1%–30% higher than that of the second ranking tracking algorithm. In some tracking challenges, such as in-plane rotation, occlusion, Scale variation even reaches 34.2%, 42%, and 62.4% respectively; For all the accuracy graphs, the tracking algorithm ranked first was generally 1%–20% more accurate than the tracking algorithm ranked second under different tracking challenges.

Summary

From the above two points of view to analyze the curves of various challenges, it can be concluded that the algorithm proposed in this paper is better than other filtering algorithms compared with it in the success rate and accuracy of two performance indicators.

eISSN:
2470-8038
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, other