With the rapid development of military science and technology, military reconnaissance methods are exhibiting trends of diversification, intelligence, and high-precision resolution capabilities, which have significantly improved battlefield information acquisition capabilities, posing greater challenges to military camouflage technology [1]. Against this backdrop, digital camouflage technology, as a part of military camouflage techniques, plays an important role in military reconnaissance countermeasures.
To cope with diversified military reconnaissance techniques, domestic scholars have conducted in-depth research on digital camouflage technology. Cai Yunxiang [2] and colleagues utilized fractal dimension estimation based on fractal Brownian motion and a layer-by-layer fuzzy C-means clustering algorithm with a pyramid structure to extract texture and primary color features from background images, achieving the generation of digital camouflage patterns.
However, this digital camouflage the generation method is relatively cumbersome and requires sufficient practical experience to produce high-quality camouflage. To automatically generate digital camouflage patterns, Yang Wuxia [2] et al. extract the main colors of the background using the K-means algorithm on the color grayscale histogram based on the target background image to generate digital camouflage. Jia Qi [3] et al. used Markov random fields and pyramid models to build a digital camouflage design system, initially achieving automation in digital camouflage design. With the development of deep learning technology, Teng Xu [3] et al. combined cyclic adversarial networks with densely connected convolutional networks to quickly and automatically generate digital camouflage patterns, but the generated patterns still lacked richness in details and textures.
To generate camouflage patterns that blend seamlessly with the background and exhibit realistic texture details, this paper enhances a generative adversarial network model based on CycleGAN [4] (Cycle-Consistent Generative Adversarial Network). Building upon the traditional framework of Cycle-Consistent Generative Adversarial Networks, this study introduces a channel attention mechanism into the existing residual network to extract image features. Moreover, it incorporates a color loss function and enhances the adversarial loss function. These modifications effectively resolve issues related to the fine details and textures of the generated patterns.
CycleGAN is an unsupervised generative adversarial network designed for translating images from one domain to another, such as from class X to class Y, without the need for paired training data. The model's key innovation lies in enforcing bidirectional image translation through a cycle consistency loss. This loss ensures that a translated image can be reconstructed into its original form within the same domain, maintaining fidelity to the original image throughout the translation process.
A complete CycleGAN model consists of two sets of generators and discriminators, each targeting a specific translation direction. Specifically:
The first set includes generator G and discriminator D The second set includes generator F and discriminator D
During the training process, two sets of generators and discriminators are trained alternately. By optimizing the adversarial loss and cycle consistency loss, the model can gradually learn the mapping between the two image types and generate images that conform to the characteristics and distribution of the target image type. As shown in Figure 1.
Structure of CycleGAN
ResNet (Residual Networks) is a deep learning network architecture that addresses the gradient issues in training deep networks by introducing residual blocks and skip connections, thereby enhancing network performance. CycleGAN utilizes multiple basic residual blocks in its generators to construct the network [6]. While ResNet solves the gradient vanishing problem through residual blocks, which enhances the efficiency of data transmission and mitigates the issues of gradient diffusion and network degradation, it does not significantly improve the quality of image generation.
The loss functions of the CycleGAN network comprise three types: the adversarial loss (referred to as GAN loss in this context), the cycle consistency loss (referred to as cycle loss), and the identity mapping loss (referred to as identity loss). The adversarial loss aims to promote the adversarial learning between the generator and the discriminator, encouraging the generator to produce more realistic samples. CycleGAN employs two adversarial losses, taking the generator G and the discriminator D
Wherein,
The image generation quality of traditional CycleGAN network models is not high, as the simple stacking of residual blocks in the generator introduces excessive redundant channels. These redundant channels are not conducive to extracting finer-grained information features, thus affecting the quality of image generation. To address this issue, this paper introduces the channel attention mechanism of SENet to optimize the residual blocks and improves the loss function, aiming to enhance the quality of image generation.
In the original CycleGAN architecture, convolutional layers and pooling layers are typically employed to extract image features. However, the feature information extracted using this method is often overly redundant. To address this, a channel attention mechanism is introduced. The channel attention mechanism calculates the importance of each channel, enabling the network to focus on more significant channels and filter out redundant information. This paper utilizes the classic channel attention method, SENet [8] (Squeeze-and-Excitation Networks), and combines the SENet model with ResNet to generate SE-ResNet modules, thereby enhancing the network's feature extraction capabilities.
The SENet module consists of two core operations: Squeeze and Excitation. The Squeeze operation compresses the feature channels through global average pooling. The Excitation operation learns the dependencies between channels through fully connected layers, generating weights for each channel, thus achieving the recalibration of feature channels. As shown in Figure 2, the input data X undergoes a convolutional operation Step One: Squeeze ( Step Two: Excitation ( Step Three: Scale (
Schematic diagram of SENet
During the above process, the Squeeze operation reduces the dimensions of the channel from C×H×W to C×1×1 through global pooling. The Excitation operation utilizes a fully connected layer to generate a weight vector for each channel's feature. Finally, the Scale operation multiplies the output of the Excitation operation with the input feature map. Through these operations, the weight vectors are assigned to each channel of the feature map, weighting the features of different channels accordingly.
SE blocks are plug-and-play and can be integrated into ResNet, resulting in SE-ResNet. The structure of SE-ResNet is illustrated in Figure 3.
Diagram of the SE-Resnet module
SE-ResNet's basic structure is similar to that of traditional Residual Networks, composed of multiple residual blocks. The key difference lies in the addition of SE modules within each residual block to extract attention information. As shown in Figure 4, by introducing SE modules, SE-ResNet can more accurately mine feature information from images and enhance the network's utilization of features from different channels.
Schematic diagram of the attention mechanism for joining channels
The workflow of SE-ResNet is as follows:
Adjust the input image to a uniform size and pass the 3-channel image through a 7×7 convolution kernel to transform it into a 64-channel image. Subsequently, use 3×3 convolutions to extract features and generate feature maps. Through a 9-layer residual structure, the image features are transformed from the source type to the target type. During this process, the input feature information is propagated and output. Utilize deconvolution operations to restore the high-dimensional feature maps from the input, in order to reconstruct the surface features of the image. The final layer performs a convolution operation to modify the dimensionality of the output from the previous step. The network structure of this generator is illustrated in Figure 5.
Improving the Generator Network Structure
In this paper, the loss function from WGAN-GP (Wasserstein Generative Adversarial Networks-Gradient Penalty) is adopted instead of the original GAN loss. WGAN-GP focuses more on matching the overall distribution, better preserving global consistency, and making the generated samples more natural. Additionally, a color preservation loss function is also included to ensure consistency in color features before and after image transformation.
To avoid gradient issues caused by the JS divergence, this paper uses the WGAN-GP loss instead of the original GAN loss. The Wasserstein distance (also known as the EM distance) in WGAN-GP loss exhibits superior smoothness and smoother curve changes. Even when the generated image distribution does not overlap with the real image distribution, the discriminator D can still accurately calculate the actual distance between them, effectively evaluating the quality of the images produced by the generator G. This allows the use of the backpropagation algorithm to optimize the discriminator D, theoretically overcoming the difficulties of gradient vanishing and gradient exploding.
The meaning of EM distance is explained in equation (4).
In this context,
To achieve gradient penalty, a gradient penalty term is introduced on the basis of the EM distance. By penalizing the gradient of the discriminator's output for samples, the calculation is performed as shown in formula (5). During each parameter update, samples are taken from the linear interpolation points between real samples and generated samples. The gradients of the discriminator corresponding to these points are then calculated. By applying a penalty term to these gradient norms, the gradient norms are restricted to a reasonable range. Where
The objective function in the WGAN-GP model is shown in equation (6).
In this paper, by reassessing the weight parameters, the images generated from random noise are taken as penalty terms and input into the network, and gradient clipping is applied to address the issue of training instability when using the EM distance. The improved objective function is shown in formula (7), where
The GAN loss typically encourages the generated samples to be as similar to the real samples as possible on a pixel-level basis. However, this pixel-level similarity can sometimes lead to inconsistencies in the details of the generated images. WGAN-GP (Wasserstein GAN with Gradient Penalty) incorporates a gradient penalty term that constrains the gradient norm of the discriminator, preventing gradient explosion and vanishing issues. This gradient penalty term also enhances the stability of the network and allows it to generate more realistic and clearer samples.
To enhance the similarity between the generated images and the original images, a color preservation loss is integrated into the cycle consistency loss. This color preservation loss function is used to calculate the distance between the distributions of real samples and generated samples. Compared to other loss functions, the color preservation loss is relatively less affected by outliers, making it more robust to noise or abnormal situations. Through regularization using the color preservation loss function, the model can be encouraged to select features with fewer non-zero weights, which improves the model's generalization ability and interpretability. The formula for the color preservation loss is shown in equation (8).The formula for the total loss function is shown in equation (9).
The experimental platform used Ubuntu 18.04 system and employed a graphics processing unit (GPU) to accelerate the training speed. The deep learning framework adopted in this experiment is Pytorch 1.4.0. The datasets used in this paper are from UCMerced_LandUse [10], Places365 [11], and others, with pre-training conducted on the public dataset monet2photo.
In this experiment, the images in the dataset were preprocessed by eliminating unqualified images, removing poor learning samples, excluding overly similar learning samples, and fixing the image sizes. Subsequently, data augmentation techniques such as scaling up, random rotation, random flipping, and adding noise were applied to expand the dataset to 4,200 images.
To evaluate the experimental results, Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) were used in this paper.
Given two image sets I and K of dimensions m×n, for any
In equation (10), MSE represents the Mean Squared Error, and its expression is given in equation (11).
The higher PSNR value is, the lower the degree of image distortion and the better the quality of the image.
Luminance, Contrast, and Structure are the three essential elements that constitute an image. Their respective definitions are given in equations (12), (13), and (14):
The SSIM algorithm consolidates the calculation formulas for luminance, contrast, and structural similarity to obtain the overall calculation formula for image similarity (12).
SSIM values range from −1 to 1, with SSIM = −1 indicating complete dissimilarity between two images, and SSIM = 1 indicating their absolute identity.
This paper conducted ablation experiments and comparative experiments. In the ablation experiments, three improvements were made to the model with the addition of the color preservation loss function: CycleGAN + improved loss function, CycleGAN + SENet, and CycleGAN + SENet + improved loss function. These experiments aimed to validate the efficacy of the proposed approach. During comparative analysis, the proposed model was assessed alongside two image generation methods, GAN and DRIT, to gauge the image generation quality across different models.
In the ablation experiments, initially, the color preservation loss function was incorporated, with the outcomes presented in Figure 6. It is evident that upon integrating the color preservation loss function, the colors of the generated images more closely resemble those of the input images.
Comparison of Results before and after Adding Color Preservation Loss
The CycleGAN was combined with the improved loss function. Three models were trained using the original GAN loss, the WGAN objective function, and the WGAN-GP objective function. Figure 7 shows the changes in the loss curves during the training process for the three different loss functions: the original GAN loss, WGAN, and WGAN-GP.
Loss Curve Variation Chart
As can be seen from Figure 7, compared to the loss function graphs of the original GAN and WGAN, the WGAN-GP loss adopted in this paper achieves a lower loss value when reaching the Nash equilibrium. This phenomenon indicates that the images generated by the WGAN-GP network have a higher quality.
The CycleGAN was combined with SENet. The curve comparisons of the various loss functions between the original CycleGAN model and the improved CycleGAN model in this paper are shown in Figure 8 and Figure 9, where the horizontal axis represents the number of epochs (iterations), and the vertical axis represents the loss value.
Loss Function Change Trend for the Original CycleGAN Model
Loss Function Change Trend for the Improved CycleGAN Model
As can be seen from Figures 8 and 9, the loss function of the original CycleGAN model has not yet shown a converging downward trend when reaching 160 epochs, while the loss function of the improved model begins to show a downward trend starting from 40 epochs and converges at 200 epochs.
The ablation experiments were conducted to verify the effects of introducing the SENet network structure and the improved loss function. Using the method of controlled variables, four sets of experiments were performed by adding different improved modules. The final evaluation index data are shown in Table I.
Evaluation index table of ablation experiments
Model | SSIM | PSNR |
---|---|---|
CycleGAN | 0.50 | 15.6 |
CycleGAN+SENet | 0.61 | 16.7 |
CycleGAN+improved loss function | 0.68 | 15.9 |
CycleGAN+SENet+improved loss function. | 0.77 | 18.9 |
According to Table I, the baseline CycleGAN model exhibited the poorest performance. Incorporating the SENet channel attention mechanism led to an SSIM increase of 0.11 and a PSNR increase of 1.1. Compared to the original CycleGAN, the version with enhanced loss functions showed SSIM and PSNR improvements of 0.18 and 0.3, respectively. By combining these approaches, SSIM reached 0.77, while PSNR achieved 18.9. The experimental results indicate that both methods, when used independently, can enhance the pattern details and texture quality. By combining these two methods in this paper, a more ideal pattern generation effect can be achieved.
In this paper, two image generation methods, GAN (Generative Adversarial Network) and DRIT (Diversity Regularized Image-to-Image Translation), were selected to conduct comparative experiments and evaluate the effectiveness of image generation. GAN is a form of generative adversarial network that generates images by allowing two neural networks (generator and discriminator) to learn from each other in an adversarial manner. DRIT is a model that can achieve image translation with different styles.
As shown in Figure 10, (a) represents the target-type image and the original image; (b) is the image generated by the GAN model, which performs poorly in terms of color and texture, significantly differing from the original image; (c) depicts the image produced by the DRIT model, which although retains some texture features, differs greatly in color features and appears blurry. (d) is the image generated by the CycleGAN model, which, compared to DRIT, not only preserves texture and color features but also exhibits richer details.
Comparison of images generated by different models
To evaluate the quality of the generated camouflage patterns, two metrics, SSIM and PSNR, were used, and the final evaluation results are presented in Table II.
Comparative experimental evaluation score table
GAN | 0.19 | 13.5 |
DRIT | 0.48 | 14.8 |
CycleGAN+SENet+improved loss function | 0.77 | 18.9 |
As seen from Table II, the SSIM value of our proposed method is 0.58 higher than that of the GAN model and 0.29 higher than the DRIT model. Similarly, the PSNR value of our method is 5.4 higher than the GAN model and 4.1 higher than the DRIT model. Combining the results from Table II and Figure 10, it can be concluded that our proposed method achieves superior performance, both in terms of visual perception and the SSIM and PSNR scores.
This paper investigates a digital camouflage generation method based on an improved CycleGAN. Firstly, the combination of residual networks with channel attention mechanisms is explored, which enables the network to focus more on important channel features. Secondly, the use of WGAN-GP loss instead of the original GAN Loss improves the adversarial loss function, avoiding the common problem of unstable outputs in traditional generative adversarial networks, thus generating more realistic and clearer samples. Lastly, a color preservation loss function is added to prevent the generator from autonomously modifying the image's hue before and after image translation, avoiding color changes. Experimental results demonstrate that compared to other methods, the proposed approach produces camouflage patterns with more realistic texture details and higher fusion with the background, confirming the effectiveness of this method.