Fast Aerial UAV Detection Based on Image Segmentation and HOG-FLD Feature Fusion

In recent years, UAV have been widely used in the military field. Due to its advantages of small size, low cost, low noise, high maneuverability, high concealment and superior stability, UAV can be applied to various of fields in the world for the purpose of reconnoitering enemy troops, detecting danger zone, tracking target, electronic interference, communication relay, and even completing the task of attack by carrying small-scale offensive weapons. Therefore, to detect non-cooperative UAV is necessary. To completed the task of independently identifying aerial non-cooperative UAV and monitoring in real time to border areas of different country could realize economic and extensive border surveillance and play a security role for the country and society.

Recently, on the UAV detection technology of domestic and foreign scholars is rare. However, the difficulty of the UAV detection technology is mainly that the UAV in the picture is easily affected by the position, the angle, the distance from the camera and its own structure, which makes it difficult to detect the UAV. Above problem is extremely similar to difficulty of target detection technology and needs to research and practice. At present, the common methods used in target detection are mainly following methods: the method based on the optical flow field [1], the inter-frame difference method and the background detection based on the target detection method. Optical flow target detection is a method of moving target, which the basic principle is: that a velocity vector is given to the moving target pixel, and the image will form an optical flow field, and its background will be obviously distinguished from the moving target vector. In the meanwhile, you can get the location of the target by analyzing dynamic image. This method is suitable for all kinds of backgrounds, and it has no requirement for background. Its drawbacks are that the number of pixels is too large, the distribution of optical motion fields is too wide and inaccurate, and the computation is too large. The method of inter-frame difference [2] is used to obtain the foreground of the target by using the threshold segmentation method to obtain the difference between the pixels of the adjacent two frames, whose calculation is small, and the ability of target recognition is poor when the illumination changes. In order to ensure the accuracy and integrity of UAV target detection, the traditional method of sliding window to detect UAV requires that the sample picture and the picture to be tested need to be scaled until the target of the image to be detected is less than or equal to the template size and the detection is stopped. So that the UAV which is larger than or equal to the template size can also be detected accurately after adjusting the pixel size of the whole image to be tested. However, the HOG feature [3-5] is extracted after the image is scaled. Extracting HOG feature need to traverse the whole image by sliding window [3] until the edge contour feature of high dimension can be obtained by matching, so that the traversal need to take a long time, which leads to the slow generation of feature descriptors, and the scale of each level requires a large amount of computation.

An image segmentation method based on graph theory and HOG-FLD feature fusion is proposed for UAV detection in this paper. This method is divided into training stage and test stage. In the training stage, it is necessary to unify and gray-scale processed the pixel size of positive and negative samples set be collected in advance, and then it could obtain the feature vector with small dimension which can stably describe the UAV contour feature after the feature extraction is carried out by using HOG-FLD feature fusion method. Finally, the feature vector is input into the support vector machine model, the support vector machine classifier is trained after a series of statistical learning. In the test stage, the candidate regions are screened out by using the segmentation method based on graph theory and are extracted features according to the feature extraction method of the training stage, and then the extracted features are input into the trained classifier to carry out classification and detecting whether the candidate area includes the target UAV. Image segmentation uses the connected method of graph [6] or the minimum spanning tree to merge the region, which mainly determines whether to merge the two regions according to the similarity of each region of the image. This method could not only obtain the global features of the image and the candidate area of UAV, but also the calculation speed is very fast and the processing efficiency is also high. The features obtained by combining the HOG features of gradient histogram and the Fisher linear discriminant analysis (FLD) are selected as the features of the statistical learning training classifier in feature extraction stage. Because of the advantages that the features of small dimension and good robustness and good classification ability extracted by HOG-FLD feature fusion are rarely affected by the changes of local illumination, background change, target location and angle change, it is easier to train a classifier that is easy to classify and has strong generalization ability.

II.

SEGMENTATION Method Based on Graph Theory to Obtain Region of Interest

Segmentation Method Based on Graph Theory

Image segmentation is a way to separate the image into regions with different characteristics and separate the interested objects from the background in the segmented blocks. Let interest objects and backgrounds in segmented regions show a clear sense of contrast based on human visual effects. Image segmentation method plays an important role in further analysis of image such as image compression and image recognition.

The image segmentation method based on graph theory (Graph-Based Image Segmentation) and the region merging method are selected to segment the image in this paper. The image segmentation method based on graph theory takes the image as the object transforming the image into the right undirected graph, and uses the connected branch method of graph [6] to segment the image, and then it could obtain the region block. The region merging method is to merge the segmented region blocks according to the specific merging rules. Because of great different characteristics of the UAV and the background in texture, color, size and degree of agreement, regional block of great different characteristics need to be decomposed and regional block of small different features need to be merged [11], which follows the principle of [10] the optimal segmentation.

Acquisition Principle of UAV Region of Interest

The image contains information such as shape, size, color, texture and so on. Image will be segmented after the whole image is traversed by the search algorithm of graph theory. In the graph theory, the definition of the image - to - graph [12] is described as follows:

G = (V, E) is an undirected graph with V as its vertex and E as its edge set. Any edge (v_i, v_j) ∈ E is defined as a weight function W(v_i, v_j) greater than zero. Any vertex v_i, v_j, represents the pixel point of the graph to be processed in graph theory, and the weight of any edge represents the gray difference, the distance or some other features between the adjacent pixels of v_i, v_j and (v_i, v_j) in an image. Using Graph Cuts algorithm [7], the graph G is divided into several disjoint independent regions, and the region is defined as G′ = (V, E′), E′ is a subset of E. The regions of the picture is segmented by the algorithm of Minimun Spanning Tree(MST) [14] that the difference of the pixels in the same region is measured through the maximum weight edge, of which the largest weight edge of segmentation region values are defined as follows:(1) $I n t (C) = \max_{e \in M S T (C, E)} ω (e)$ (2) $C \in V$

Where C is the generated partition region, and MST(C, E) represents the set of partitioned regions or the Minimun Spanning Tree [17] that C generates in E.

The resulting dissimilarity between pixels within the segmented region could also be understood as the weight of the minimum edge that connects the vertices of two segmented regions, as defined below:(3) $D i f (C_{1}, C_{2}) = \min_{v_{i} \in C_{1}, v_{j} \in C_{2}, (v_{i}, v_{j}) \in E} W ((v_{i}, v_{j}))$

When there is no edge connection between the two partitioned regions, there is Dif(C₁, C₂) = ∞. The condition for the appearance of its partition boundary is defined as follows:(4) $D (C, C) = {\begin{array}{l} t r u e & i f, D i f (C_{1}, C_{2}) > M I n t (C_{1}, C_{2}) \\ f a l s e & o t h e r w i s e \end{array}$

If true in the expression is satisfied, then the boundary is represented, then the region is segmented. otherwise, it is merged.

Where the smallest internal difference of division is defined as follows:(5) $M I n t (C_{1}, C_{2}) = \min (I n t (C_{1}) + τ (C_{1}), I n t (C_{2}) + τ (C_{2}))$

τ() is a threshold function, which is used to control the merging degree of the image segmentation region, which is defined as τ(C) = k/|C|, where C is the number of the segmentation region or the vertex, and k is a constant parameter that is used to control the coarse granularity of its image segmentation area. The segmentation effect is shown in Figure 1 (b). That small regions will be merged means detecting whether the segmented region meets the regional boundary conditions, if not, then they will be merged. The regional boundary conditions could be measured by the following four types of similarity in the image:

Color [13] similarity S_colour(r_i, r_j) means that 25 bins one-dimensional color histogram of the three color channels are acquired after image is normalized. Or that means that the three color components are combined into one dimensional feature vector, so that each region of the image could be represented as a 75 dimensional feature vector $C_{i} = {c_{i}^{1}, \dots, c_{i}^{n}}$ , and the color calculation formula of the region is:(6) $\begin{array}{l} S_{c o l o u r} (r_{i}, r_{j}) = \sum_{}^{} \min (c_{i}^{k}, c_{j}^{k}) \\ C_{t} = \frac{s i z e (r_{i}) \times C_{i} + s i z e (r_{j}) \times C_{j}}{s i z e (r_{i}) + s i z e (r_{j})} \end{array}$

Where size(r_i) is the number of pixels contained in the region (r_i), and the number of pixels contained in the new region is size(r_i) + size(r_j).

Texture [13] similarity S_texture(r_i, r_j) means that calculate Gaussian differential for 8 different directions of three color channels separately (its σ = 1), 10 bins one-dimensional histogram of each direction of three color channels is respectively obtained after image is normalized, and then the region could be expressed as a 240-dimensional vector $T_{i} = {t_{i}^{1}, \dots, t_{i}^{n}}$ , and its texture similarity calculation formula is as follows:(7) $S_{t e x t u r e} (r_{i}, r_{j}) = \sum_{k = 1}^{n} \min (t_{i}^{k}, t_{j}^{k})$

Dimension similarity S_size(r_i, r_j), which is used to merge smaller regions as early as possible. Its formulas are as follows:(8) $S_{s i z e} (r_{i}, r_{j}) = 1 - \frac{s i z e (r_{i}) + s i z e (r_{j})}{s i z e (i m)}$

Where im refers to the whole image.

The matching similarity fill(r_i, r_j) is used to merge the intersected regions as soon as possible. Its account form is as follows:(9) $f i l l (r_{i}, r_{j}) = 1 - \frac{s i z e (B B_{i j}) - s i z e (r_{i}) - s i z e (r_{j})}{s i z e (i m)}$

the similarity set S could be gotten by combining these four kinds of similarity, and the formula of the combination is as follows:(10) $S (r_{i}, r_{j}) = a_{1} S_{c o l o u r} (r_{i}, r_{j}) + a_{2} S_{t e x t u r e} (r_{i}, r_{j}) + a_{3} S_{s i z e} (r_{i}, r_{j}) + a_{4} S_{f i l l} (r_{i}, r_{j})$

In order to judge whether the boundary condition of a_i ∈ {0,1} satisfies the similarity. The two segmented region blocks r_i and r_j which have the largest similarity are merged and a region r_t is obtained after the similarity degree of the segmented regions in the similarity set S is compared and sorted. The similarity between two adjacent regions of r_i and r_j in the set S is removed in the meanwhile. Then according the step of this method, we begin to calculate the similarity between the new merged region r_t and its current adjacent regions, and we add this similarity to the similarity set S again. At the same time, we also add r_t to the region set R, and we finally label the region with a rectangular box. The segmentation effect is shown in Figure 1 (c) and (d).

III.

TARGET UAV DETECTION BASED ON HOG-FLD FEATURE FUSION AND SVM DETECTION

The method could be divided into two stages from the perspective of practice: training and testing. In the training stage, the window with constant shape and size is used to traverse the whole UAV or non-UAV sample images, and the feature is extracted by HOG-FLD feature fusion method. A computable feature vector is obtained to describe the contour of the image, which is robust and easy to classify. The SVM classifier [8-9] for UAV detection could be obtained by statistical analysis and learning of these Eigenvectors. In the early stage of the test, the same method as the training stage is used to extract the feature of the regions to be detected, and the candidate targets of these areas are classified by the trained SVM classifier in the later stage to determine if the target in the candidate area is a UAV.

Feature Extraction From HOG-FLD Feature Fusion

Because the edge contour feature of UAV has strong stability and extensibility, the HOG feature with strong ability to describe contour is extracted from data set. However, because of the large dimension of HOG feature vector, it is unfavorable to the training of classifier. In order to train the classifier with good classification ability, it is necessary to reduce the dimension of extracted HOG feature by FLD [18] feature fusion and finally obtain the feature vector with small dimension. To sum up, the feature extraction based on HOG-FLD feature fusion is adopted in this article.

The basis of feature extraction in this article is to extract feature of gradient histogram of oriented gradient (HOG). The thought of algorithm is as follows: calculating the gradient of selected image; The whole image is divided into rectangular cells of fixed size and equal size, each cell has pixel including m × m the cell is divided into 9 undirected channels) or 18 directed channels, and the gradient histogram of each direction is voted on. The voting weight is the gradient value that has been calculated earlier. The divided cell units above are composed of fixed blocks of the same size. Each block contains n × n cell elements, which the local eigenvector corresponding to each block is normalized in order to reduce the effect of illumination on the results. The feature vectors of these blocks are combined to form the HOG feature vectors of the image.

The steps of extracting HOG feature are as follows: graying the positive sample image of UAV, filtering the input image with Gamma correction method so that the image could achieve the standard contrast of color space and reduce the effects of local shadows and light changes on the image; dividing the image into a certain number of cells, and forming a fixed number of cells into a certain number of blocks of equal size as in the preceding paragraph, classifying the gradient range according to the above mentioned rules, we could calculate the cell features in these blocks and finally connect all the blocks together to obtain the feature vector containing the whole target UAV image. Formula (11) is a formula for normalizing the image, formula (12) and formula (13) could calculate the gradient component of each pixel, formula (14) and formula (15) could calculate the size and direction of the gradient.

(11)

v_{g} \leftarrow \sqrt{v_{g} / (| {‖ v_{g} ‖}_{1} + ε)}

(12)

G x_{(x, y)} = p i_{(x + 1, y)} - p i_{(x - 1, y)}

(13)

G x_{(x, y)} = p i_{(x, y + 1)} - p i_{(x, y - 1)}

(14)

S_{(x, y)} = \sqrt{G x^{2}_{(x, y)} + G y^{2}_{(x, y)}}

(15)

θ_{(x, y)} = \arctan (G y_{(x, y)} / G x_{(x, y)})

Where $v_{g}^{*}$ is the result of histogram normalization, and v_g is the extracted vector histogram; pi_(x+1,y), pi_(x−1,y), pi_(x,y+1) and pi_(x,y−1) respectively denote the location of 4 pixel points; Gx(x,y) and Gy(x,y) respectively denote the coordinate position in the horizontal and vertical directions of the two pixels. And S(x,y), θ(x,y) respectively denote the length of the gradient direction vector and the angle of the gradient direction vector.

In this article, a window including 64 * 128 pixel is used to scan the sample image and the image to be detected using a window including 64 × 128 pixel. The scanning step size is 8 pixels (scanning is in horizontal and vertical direction). The window is divided into cells including 8 * 8 pixel and forms 8 * 16 = 128 units. Then setting up four adjacent units up and down to the left and right as a block of pixels, a window contains 105 blocks of pixels. A 3780 dimensional feature vector named HOG feature description value is generated in a window containing 105 pixel blocks according to the calculation steps of HOG. Its specific HOG algorithm is shown in Figure 2:

On the basis of HOG feature, the linear subspace is constructed by using Fisher Linear Discriminant Analysis (FLD) [18]. By calculating the optimal projection matrix, the projection matrix for feature extraction of the training set is obtained, and the similarity of its projection vector is taken as the similarity degree S_cos of cosine similarity, which purpose is to reduce the intra-class dispersion S_W as much as possible and to increase the inter-class dispersion S_B as much as possible (or that is in the training concentration, to make the sample data of the same UAV as close as possible, and the sample data of different UAV to be far away). In this way, the features with classification ability are extracted. When dealing with a problem that type is c, the mathematical formulas of inter-class dispersion S_B and intra-class dispersion S_W should be defined as follows:(16) $S_{B} = \sum_{i = 1} N_{i} (μ_{i} - μ) {(μ_{i} - μ)}^{T}$ (17) $S_{W} = \sum_{i = 1}^{c} \sum_{x_{k} \in Ω_{i}} (x_{k} - μ_{i}) {(x_{k} - μ_{i})}^{T}$

Where μ_i denotes the mean value of class Ω_i; μ means the mean value of total sample; N_i means the number of samples of class Ω_i. The optimal projection matrix W_opt could be obtained by solving the optimization problem such as formula (18), where S_W must be a nonsingular matrix (or that is the total number of training samples N is greater than the characteristic dimension of UAV image):(18) $W_{o p t} = \arg \max_{W} \frac{| W^{T} S_{B} W |}{| W^{T} S_{W} W |}$

W_opt could also be obtained by solving the generalized eigenvalue problem such as the formula (19):(19) $S_{B} W = S_{W} W Λ$

In order to solve the problem that the intra-class dispersion matrix S_W is singular, PCA principal component analysis (PCA) is used to reduce the dimension of the feature space (dimensionality reduction to N-cu), and then Fisher linear discriminant analysis (FLD) is used to deal with it. The projection vector y of test sample x is obtained according to formula (20):(20) $y = W_{o p t}^{T} x$

Cosine similarity S_cos is used as the similarity measure of projection vector y. Where the cosine similarity S_cos of vector A = {a₁,a₂,…,a_n} and B = {b₁,b₂,…,b_n} is defined as follows:(21) $S_{\cos} = \frac{〈 A, B 〉}{{‖ A ‖}_{2} {‖ B ‖}_{2}} = \frac{\sum_{i = 1}^{n} a_{i} b_{i}}{{‖ A ‖}_{2} {‖ B ‖}_{2}}$

Support Vector Machine

Because the support vector machine (SVM) proposed by Vapnik has the advantages of simple system structure, global optimization, good generalization, and short training and prediction time [9], this paper uses SVM as a machine learning tool to calculate the rule of samples in order to achieve fast and efficient learning sample features and accurate classification purposes. The main idea of SVM is to deal with the linear inseparability of the original space by selecting the kernel function of Polynomial Kernel to correspond the data to the high-dimensional space. When the algorithm is used to realize the two-classification, the sample features such as HOG must be extracted from the original space first, and then the sample features in the original space are represented as a vector in the high-dimensional space. In order to minimize the error rate of the two class classification problems, we need to find a hyperplane that is used to divide the two classes in the high dimensional space.

Let the sample set be x_i, y_i, where i=1,2,...,N, x_i ∈ R^e, y_i ∈ {0,1} is the class identifier. Then in e dimensional space, its linear discriminant function is as follows:(22) $g (x) = w \cdot x + b$

The formula of the classification surface equation is as follows:(23) $w \cdot x + b = 0$

After normalization of the discriminant function, the following conditions must be satisfied for the two types of samples:(24) $g (x) \geq 1$

The classification interval could be $\frac{2}{‖ w ‖}$ , where the maximum requirement ||w|| for the classification interval is kept to a minimum, all samples must be correctly classified, and the following conditions must be met:(25) $y_{i} [(w \cdot x_{i}) + b] - 1 \geq 0$

An SVM whose inner product function is maximum requirement k(x_i, x_j) is constructed with the following formulas (which can be understood as the formula for calculating the extreme value of a quadratic function with conditional constraints):(26) $Q (a) = \sum_{i = 1}^{N} a_{i} a_{j} y_{i} y_{j} k (x_{i}, x_{j})$

Its constraints are expressed as: $0 \leq a_{i} \leq C, \sum_{i = 1}^{N} a_{i} y_{i} = 0$

The formulas of the support vector machine that could be calculated are as follows:(27) $f (x) = sgn (\sum_{i = 1}^{N} a_{i}^{*} y_{i} k (x, x_{i}) + b^{*})$

Among them, b* is a constant parameter, indicating the size of the threshold that needs to be classified.

Sample Preparation and Classifier Training

In this article, 900 positive sample images and 900 negative sample images are selected, and 500 original images of positive samples are selected to calculate the aspect ratio of UAV. An image that aspect ratio is 1:2 is obtained, and the pixels are normalized to 256 * 512 to avoid the effect of image size on the recognition effect. As described in the HOG feature section above, each image could produce 105 pixel blocks, and each pixel block could obtain a 36-dimensional feature vector, so a image will finally produce a 3780 dimensional HOG feature vector. The extracted HOG vector is used as the input vector of Fisher linear discriminant analysis algorithm to reduce the dimension of the whole vector. The dimension of the vector is adjustable and its parameter is determined according to the recognition efficiency of the experiment. Sending the vector reduced dimension to the SVM model, the SVM classifier is obtained to detect whether the UAV is included in the image under test.

IV.

PARAMETER ANALYSIS AND EXPERIMENTAL RESULTS

The environment used in this experiment is that the simulation software of Matlab R2015b based Windows 7 on 64-bit operating system. Where CPU of computer is i5-6500 and its installed memory is 4.0 GB, and the number of the test images are 200. The best detection effect is obtained by adjusting the important parameters that affect the experimental results.

Parameter Selection of FLD

In the process of extracting feature vector, the dimension that needs to be reduced by Fisher linear discriminant analysis algorithm is the parameter k of FLD algorithm, which changes with the change of this parameter. The contrast graph of recognition time is shown in Figure 3. It can be seen from the graph that when k is 50, the whole algorithm has the least recognition time.

Experimental Effect of Aerial UAV Detection Algorithm Based on Region of Interest

Experiments show that in the whole algorithm, when the number of pixels in the divided blocks is 8*8, SVM kernel function type is 2, when the threshold of segmentation is k=90 and sigma=10, The algorithm has the best overall recognition effect on accuracy rate and time, and the recognition result is shown in Figure 4:

Comparison of Detection Effect of Traditional HOG-SVM UAV Detection Algorithm

In order to verify the efficiency of the proposed method, the experimental results of the proposed method are compared with the experimental results of the HOG and SVM method based on image segmentation, and the experimental results are obtained by statistics. As shown in TABLE I, and the average recognition time of this method is faster than that of the latter method.

TAG

The mechanism based on the region of interest to obtain the region of interest is used in this article. In the testing phase, the acquired regions of interest are input into the trained SVM classifier, which reduces the recognition time. The dimension reduction of Fisher linear discriminant analysis (FLD) makes SVM easy to train and in the phase of feature vector extraction. Compared the HOG-SVM detection method based on image segmentation with the region of interest based aerial UAV detection algorithm, this paper uses Matlab to carry out simulation experiments. The experimental results show that the detection algorithm based on region of interest is better than the sliding window method based on image segmentation to detect UAV in terms of accuracy and time.

eISSN:: 2470-8038
Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Computer Sciences, other

Feed RSS della rivista

Fast Aerial UAV Detection Based on Image Segmentation and HOG-FLD Feature Fusion

Pubblicato online: 07 mag 2018

Pagine: 106 - 114

DOI: https://doi.org/10.21307/ijanmc-2018-020

Parole chiave
Image Segmentation, Graph Theory, HOG, FLD, SVM

© 2018 Li Xiaoping et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

TABLE I.

Fast Aerial UAV Detection Based on Image Segmentation and HOG-FLD Feature Fusion

Pubblicato online: 07 mag 2018

Pagine: 106 - 114

DOI: https://doi.org/10.21307/ijanmc-2018-020

Parole chiaveImage Segmentation, Graph Theory, HOG, FLD, SVM

© 2018 Li Xiaoping et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

TABLE I.

Parole chiave
Image Segmentation, Graph Theory, HOG, FLD, SVM