Image segmentation is the division process of independent regions in an image with particular meaning which makes the same region represent the same features. The purpose is extracting the interesting parts called “the foreground”, and the rest parts are called “the background”. However, the characteristic differences between the foreground and the background may be reflected in various aspects such as grayscale, contour, texture, etc., so there is currently no general algorithm to solve the both problems. The result of image segmentation will directly affect the subsequent image processing results. At present, the application field of interactive image processing is very broad, and there are many researches on segmentation algorithms for stereoscopic images and color images.
The superpixel is a set of adjacent pixels which have similar features in space. The superpixel block can retain the effective information of image without affecting the boundary information of the images. In the conventional image processing, the pixel is regarded as a basic processing unit with high complexity. If replace the pixel point with superpixel block, the complexity in the process can be reduced. Therefore, in image segmentation, the image is preprocessed by using superpixels to improve the accuracy and speed of image segmentation.
The basic idea of the SLIC (simple linear iterative clustering) algorithm is converting a color image into a 5-dimensional feature vectors in CIELAB color space, and setting a specific distance metric for the vectors, finally clustering the pixels in the image. The superpixel produced by the SLIC algorithm is not only fast, compact, but also nearly uniform and with good contour.
Specific implementation steps of the SLIC algorithm:
Among them, dc represents the color, ds represents the space, Ns represents the maximum spatial distance within the class, and the formula Ns=S=sqrt(N/K) is applicable to every cluster. The maximum color distance is related to the image and is usually replaced by a fixed constant m. The calculation formula of the distance metric D' in the above formula can be changed to the following formula (4):
Since some pixels are not searched by only one but multiple seed points, it is necessary to calculate all their distances and to compare the minimum value as the cluster center of this pixel.
/*initialization*/
Sampling the pixel by the step size S, initializing the cluster center
Moving the smallest gradient seed point position in the S*S cluster center
Setting the label for each pixel i = -1
Setting the distance for each pixel
/*distribution*/
for every cluster center
for each pixel i is in a surrounding 2S*2S area calculating the distance D between C and i
if D<
setting
setting
/*Update*/
calculating a new cluster center
calculating the residual E
If E<= threshold
End the cycle
In the SLIC superpixel generation algorithm, the segmentation results of the image are different because of the number of superpixel generation and the setting of the compact coefficient. The compact coefficient is the ratio of tightness between the color feature and the XY coordinate feature. Generally, compact coefficient affects the shape of the superpixel.
The number of superpixels has a certain influence on the segmentation effect. In the case where the compact coefficient is 32, change the number of superpixels, and the number of superpixels generated is 128, 256, 512. Segmenting the images (a) and (b) in Fig. 1 respectively, and the experimental results are as follows:
Image (a) segmentation result of different superpixels with the 32compact factor
It can be seen from Fig. 1 that, sometimes, the number of divided super-pixels is slightly smaller than the set value. It is mainly because that the too small segmentation regions are merging together in the processing. We can obtain from the experimental results that as the set value of superpixel number increase, the edge degree of generated superpixel block and the accuracy of segmentation result are high and vice versa.
According to the definition of compact coefficient, it will affect the shape of generated superpixel. Figure 2, the image (a) is divided with the size of the compact factor changing. The experimental results are as follows:
Image (a) different compact coefficient segmentation results when the number of superpixels is 64
It can be seen from Fig. 2, in the case where the number of superpixels is set to be certain, as the compact coefficientIt can be seen from Fig. 2, in the case where the number of superpixels is set to be certain, as the compact coefficient increases, the generated superpixels become more and more regular; on the contrary, they become more and more irregular. However, the larger compact coefficient changes, the greater time complexity of the algorithm will be. After multiple experiments, it is found that when the compact coefficient is set at 40-60, the generated superpixel blocks are relatively regular, which is in line with the further experimental requirements of this paper.
The basic idea of the Graph cut algorithm: users mark the foreground object and the background object, and assign the pixels with the highest probability to the pixels of unknown label according to the existing labels. The algorithm needs to combine the image segmentation problem with the image min cut problem, to parse and store the digital image by related properties of the matrix. Pixels in the grayscale image can be represented by a matrix, the row of the matrix represents the height of the image, the columns represents the width, the elements stored in the matrix are the pixels of image, and the values of matrix elements are the grayscale of pixels.
Step 1: Construct an undirected graph to represent an image which will be split, wherein V and E represent the sets of vertex and edge respectively. Using the Graph Cut algorithm to process the graphs, and the two endpoints are represented by "S" and "T", which called terminal vertices. The remaining endpoints are all connected with these two terminal vertices to form a part of edge collection. There are two different vertices, simultaneously there are also two edges in Graph Cut algorithm, as shown in Fig. 3.
Network of terminal vertices and ordinary vertices
If there is a value whose sum of edge weights is the smallest, then this energy-minimization-based algorithm can be converted into the formula (5):
How to cut these edges to find the minimum cut? In order to solve this problem, we can regard the entire Graph cut image segmentation as a pixel mark problem. The target label set as 1, and the background label set as 0. The entire setup process can be obtained by the energy minimization function described above. Then the cut which can separate the target from the background is the requirements of this article. Using the found cut to classify the pixels in the image and to make different marks on the background and foreground, then the segmentation process of the entire image ends.
The process of energy minimization can be solved according to the maximum flow algorithm, as shown in Fig. 4:
Maximum flow minimum cut diagram
In this paper, the SLIC algorithm and the Graph cut algorithm are combined to segment the image. We use the average pixel of all pixels in the generated superpixel block to instead the original pixel, and the interactive factor is added. The user object selects the area where the target object is located, and the rest of the area is treated as the background point. The image is segmented by the Graph cut algorithm, and the segmentation result and the standard segmentation image are compared each other to analyze the segmentation effect. The algorithm flow is shown in Fig. 5:
Algorithm flowchart
We selected a set of images for segmentation, and compared the results by three data: scorpion rate, accuracy, and recall rate. Among them, the scorpion rate is a more comprehensive indicator that can reflect the regression rate and accuracy rate. The segmentation results are shown in Table 1:
MULTIPLE SETS OF IMAGE SEGMENTATION RESULTS
Relationship between recall rate and accuracy
Among the above 10 pictures, the 6th image "soldier" has a high accuracy rate, but a low regression rate, only 75%, similarly as the 1st image "house" and the 4th image "athlete". These three images are relatively complicate in gray value of the main object, all of which have the pixels of high and low gray value simultaneously. For example, "soldier" has a lower gray value clothing, while the helmet has a higher one. When the helmet is divided into the background, the final segmentation result lost a part of foreground object, so the accuracy is high, but the recall rate is low. On the contrary, it is also the reason why the 2nd image "chicken" has a high recall rate.
The parameter K value determines the proportional relationship between the boundary term and the region term. In this section, we study the difference between the image segmentation results under different K values. As shown in Fig. 7.
Relationship between K value and recall rate
Figure 7 analyzes the change of recall rate under different K values, and we find that not all curves have a recall rate that increases with the rise of K value. This shows that when the image is segmented, only adjusting K value does not effectively increase the recall rate. Through the rate of change in the line chart, it is found that when K increases from 500 to 40500, the curve with the largest amplification in recall value changes by less than 7%. The change of K value is very limited to adjust the recall rate.
Figure 8 analyzes the relationship between K value and accuracy. We find that not all the accuracy of images increase with K value. Some images do not change much with increase of K, such as "cat"; some images will decrease as the K value increases, such as "car." It shows that the increase of K value is not as good as the selection of previous scenic spots and background points, or the selection of the frame selection. It can only distinguish the influence on segmentation result by eyes. Therefore, in actual operation, K value does not have to be set as a variable parameter, but can be specified as an appropriate value by users at the beginning.
It is concluded by complexed Fig. 7 and Fig. 8 that the selection of the K value can only change the edge of the image segmentation in terms of visual effects, and does not significantly affect the segmentation result numerically, and the difference in the influence of the K value on different images is also very big. The 11th sample "dog" image of Fig. 5 is significantly less accurate than the other images. Compared with other samples, we found that the foreground gray value of "dog" is larger than the background gray value, and the detection algorithm uses the small gray value points as the segmentation target to intersect with the marker image (GT image), which makes the image with light color as the main object has a high recall rate in the segmentation task but a obviously low accuracy. On the contrary, when the foreground and background of the image are quite different, the recall rate and accuracy will be high, so the image segmentation effect will be better.
Relationship between K value and accuracy diagram
Figure 9 analyzes the effect of the number of superpixels on the segmentation results. The algorithm process is: first, preprocess the superpixel block generated by the image; then use the superpixel block average value to instead the original pixel value; finally, the preprocessed image is operated by the Graph cut algorithm. If the number of super pixel blocks is different, the experimental results will be different. As shown in Fig. 9, we analyze the segmentation result by changing the number of superpixel blocks generated in a same photo.
As can be seen from Fig. 9, the effect of the number of superpixels on the segmentation result is different for different divided images. Within a certain range, as the number of superpixels increases, the accuracy of the segmentation result increases. While after exceeding this range, the accuracy of segmentation results will decrease when the number of superpixels increases. From the segmentation results of the above several images, we can infer that it is not the more the number of superpixels divided, the better the segmentation effect. When the number of superpixels increases to a certain range, the segmentation effect is almost no longer improved.
Relationship between the number of super pixels and the accuracy rate
This paper implements an improved image segmentation software design based on the combination of superpixel and Graph cut algorithm. Test and analyze different scenarios and parameters, we have achieved a better robustness of the software, and making it adapt to most interactive segmentation tasks.
In the result detection, we find that the selection of the K value and the number of superpixels will determine the degree of detail retention at the edge of the image foreground. Since the changed points are the small proportion of the whole picture pixels, currently it is difficult to judge by the recall rate and the accuracy rate, and only the method of human eye recognition can be used to identify the advantages and disadvantages. Compared with the selection of the K value and the number of superpixels, the selection of the foreground or background points and the optimization of the frame clipping method can have a numerical influence on the segmentation result. The foreground gray value selection should be as large as possible from the background, which is beneficial to the image segmentation result. In the selection of the optimization frame, the foreground object should be framed as completely as possible, and the background should be placed as little as possible to achieve the effect of optimizing the segmentation.
The point and frame selection steps reflect the superiority of interactive segmentation, which can achieve the users' subjective willingness to split in the program. Even if there are multiple theme objects in some scenes, you can divide the different objects by framing them.