Acceso abierto

Deep Learning-based Research on Stylistic Migration and Creative Assistance for Drawing Artworks

 y   
19 mar 2025

Cite
Descargar portada

Introduction

From the development of primitive drawing as the basis of painting to creative drawing as a mature and independent work of painting, the research and exploration of the language and art form of contemporary drawing has become open and diverse [12]. Creative drawing as a fusion of concepts, aesthetics, forms, techniques, materials and other elements into a single independent work of art as the emerging art of drawing, has a great value for exploration and research. The rise of creative sketching for the development of painting art has a great role in promoting not only the development of creative sketching and the art of drawing itself, but also to make a closer interoperability and integration between the various art disciplines and common development and progress [36]. Drawing is very familiar to the public and artists, but for a long time, especially in the development of art to contemporary times, people, especially in China, have some ambiguities about the understanding of drawing and the understanding and awareness of creative drawing. The reason for this problem is that people ignore the important role of “creativity” in the process of drawing and painting [79]. The so-called “creativity” refers to the conceptual, concrete and systematic creations that people make by borrowing the art form of sketching. It is important to be clear about what is meant by “the basic nature of drawing (work-in-progress drawing)” and “the creative nature of drawing (creative drawing)”. The formation of this question has led to a new understanding and exploration of the ultimate significance of the artistic language of “sketching” [1012]. The initial understanding of sketching is the basic modeling training and the artist’s pre-creation sketch, that is, the initial stage of learning and mastering painting, the artist’s pre-creation sketch and partial exploration, and the construction stage from the completion of the entire creative concept, and it is precisely in the initial sketching basic modeling practice and reflection and exploration that he finds his own “creative” language [1315]. The style migration technique has attracted much attention in the field of computer vision, and this technique can realize the style transformation of images, with a background in various fields such as image processing, artificial intelligence, and computer art. The rise of this research direction stems from the pursuit of novel approaches to image processing and art creation, and is also driven by the continuous development of computer vision and machine learning techniques [1618]. Research on style migration has driven the changing role of computers in art creation, and has been furthered by the wave of deep learning. The use of deep learning to assist the style migration and creation of drawing artworks can promote the development of innovative thinking of artists, and can also provide a pioneering and learnable innovation model for the development of other painting, and promote the development and prosperity of the art of painting [1922].

In this paper, Generative Adversarial Network (GAN) in deep learning is used as a substrate to improve the traditional generative adversarial network, and CycleConsistent Generative Adversarial Network (CycleGAN) is chosen as a method for image segmentation migration of sketching works. Optimization and loss function design of the network model are carried out to construct the style migration model of sketch works based on improved GAN. Relevant experiments are designed to compare the algorithmic losses of this paper’s CycleGAN model with those of other image style migration models, compare the number of parameters and running time of each image style migration model, and test the running efficiency of this paper’s CycleGAN model. In addition, subjective and objective evaluations of the style migration sketches generated by each image style migration model are conducted to test the quality of the images generated by this paper’s CycleGAN model.

Improved GAN-based style migration model for sketches
Cyclic Consistent Generative Adversarial Network (CycleGAN)

Zhu et al. first proposed Cycle Consistent Generative Adversarial Network (CycleGAN) to realize the migration between asymmetric data [2324]. CycleGAN has wider applicability and utility due to its ability to accept unpaired training data and learn the mapping relationship between source and target domains. This ability allows CycleGAN to perform tasks such as image style migration without requiring paired training data. The model is a cyclic adversarial training structure which is an image style migration model based on the idea of pairwise and also consists of two GAN networks that is two generators G and F and two discriminators DX and DY. The CycleGAN network architecture is shown in Fig. 1.The network aims to go for the learning task of image migration from X domain to Y domain and it uses an adversarial loss function to learn the generator G: XY such that the The discriminator cannot distinguish the authenticity of the generated image from the original image and the network also has the ability to map the Y domain to the X domain, which is realized by the generator F, i.e., F: YX. In addition, the bi-discriminators DX and DY discriminate the migrated image to ensure the quality of the image. In the loss function, the cyclic consistency loss is introduced as a constraint between the two generators. It serves to ensure that the generators can generate images corresponding to the original image, thus preventing the generators from deceiving the discriminator by generating images in the real image domain, ensuring that G and F are inverse to each other, and guaranteeing that the space of the generated images is limited in a certain way, which is of practical significance.

Figure 1.

CycleGAN network architecture

As shown in figure (a) above, firstly the real image PX of X domain is migrated into a false image PY of Y domain through generator G, and then PY the image is subjected to the reconstruction operation through generator F, through which the original image feature information is preserved, and in this process it is cyclically consistent, i.e., xG(x) → F(G(x)) ≈ x and yF(y) → G(F(y)) ≈ y. As shown in the figure above, (b) is the forward cyclic consistency, and (c) is the reverse cyclic consistency. DY is used to determine x whether the image belongs to Y domain, the image x passes through the generator G to generate Y domain image P^ , the generator F reconstructs Y^ into an image X^ that is similar to x, x to be consistent with x^ as much as possible, and vice versa, and the distance between the two of them is the loss of cyclic consistency. The adversarial loss function in Fig. (b) is as follows: LGAN(F,DX,X,Y)=Eypdata(y)[logDY(y)]=+Expdata(x)[ log(1DY(G(x)) ]

The adversarial loss function in Fig. (c) is as follows: LGAN(F,DX,X,Y)=Expdata(x)[logDX(x)]=+Eypdata(y)[log(1DX(F(y))]

The cyclic consistency loss function is given in the following equation, using a L1 loss: Lcyc(G,F)=Expdata(x)[||F(G(x)x||1)]+Eypdata(y)[||G(F(x)x||1)]

The resulting CycleGAN total loss function can be expressed as: L(G,F,DX,DY)=LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)+λLCYC(G,F) where λ controls the importance of the correlation between two objects, and we aim to go for a max-min problem similar to the original GAN: G*,F*=argminG,FmaxDX,DYL(G,F,DX,DY)

Generator and Discriminator Architecture
Generator Architecture

A generator takes random noise as input and produces synthetic samples that resemble real training data [25]. A generator usually consists of one or more deep neural networks, often using convolutional layers to generate images or recurrent layers to generate sequential data. The output of the generator is fed to a discriminator, which is then trained to distinguish the generated samples from the real training data. The generator architecture is shown in Figure 2.

Figure 2.

Generator architecture

The generator is a key building block in the CycleGAN architecture, and understanding its role and structure is critical to understanding the CycleGAN training process. The generator architecture consists of three components: the potential space, the generator, and the image generation part. The generator samples from the potential space and establishes a relationship between the potential space and the output. We then create a neural network that maps from the input (potential space) to the output (most examples are images). During adversarial training, we connect the generator and discriminator together in the model and train the generator to generate images that are indistinguishable from real images. Ultimately, the generator produces the output image we see after the entire training process. Training CycleGAN focuses exclusively on training the generator, while in most architectures, the discriminator requires several epochs of training before starting the training process.

Each component of the CycleGAN architecture is defined as a class, and the generator class has three main functions: the class template, the loss function, and the buildModel function. The loss function is a custom loss function that is used to train the model when needed, while the buildModel function builds the actual neural network model. Model-specific training sequences will be included in this class, although we may only use internal training methods for discriminators.

Discriminator Architecture

The discriminator in the CycleGAN architecture acts as a deep neural network that distinguishes between real and fake images by generating scalar values between 0 and 1 (indicating the probability that the input is real). It is trained to be an accurate binary classifier, minimizing the cross-entropy loss between its predicted and true labels. The architecture of the discriminator usually includes a convolutional neural network CNN and is trained using both real and generated datasets to balance its training with the generator.

The discriminator is an important part of the CycleGAN architecture as it acts as an adaptive loss function, learning and adapting to the underlying distribution of the data rather than relying on heuristic techniques [26]. It evaluates the veracity of real and generated images and gradually learns to distinguish between them, thus allowing the generator to generate new, previously unseen data from the latent space. The generators are trained to minimize the logarithmic loss in the output of the discriminator that generates the samples, aiming to produce realistic images while minimizing the difference between the generated data and the real training data.The training process of CycleGAN involves iteratively training the generator and the discriminator in an adversarial manner until convergence, thus generating new data that is similar to the training data. The discriminator architecture is shown in Figure 3.

Figure 3.

Discriminator architecture

Network model optimization

Based on the training process of the underlying network model, the loss function design used in this paper includes a forward mapping loss L1 for mapping the original sketch image domain A to the sketch style domain B in the model, a backward mapping loss L2 from the sketch style domain B to the original sketch image domain A, a cyclic consistency loss L3 between the two, and a constant mapping loss Lid used to ensure that the generative model acts.

Forward mapping loss

The loss function of forward mapping L1 makes the data distribution of the generated image GAB(a) match the data distribution of the target domain image B as much as possible under the power of CycleGAN’s generative and discriminative models’ cyclic confrontation, which enhances the conversion of the generated image to the sketching target style. The process loss function L1 is shown in equation (6): L1(GAB,DAB,A,B)=Ebpdata(b)[logDAB(b)]+Eapdata(a)[log(1DAB(GAB(a)))]

Where a and b are training samples that satisfy the data distribution of original sketch images and sketch style migration images, respectively, GAB and DAB are the generative and discriminative models of CycleGAN. GAB is used to convert the input natural image domain A into a sketch style migrated image GAB(a) that satisfies the style of domain B, DAB is used to discriminate whether the input image is from a real sketch image B or a generated sketch image GAB(a), both of which are optimized under the power of the generative adversarial network CycleAdversarial to optimize the generated image so that it pushes the data distribution of the generated image GAB(a) to the sketch domain B under the action of L1.

Backward mapping loss

In the whole network model, in addition to the need for the generative model GAB to synthesize the original sketch image into a sketch style migration image, the generated sketch image GAB(a) also needs to be reconstructed using the generative model GBA to ensure that the generated image has a sketch style without losing the semantic information of the original image. Therefore, a backward training process of synthesizing the sketch image domain B into the original natural image domain A is required to enhance the ability of the generative model GBA to synthesize the original natural image. In this process, the generative model is used twice with the process by aGABb & bGBAa, while here the discriminative models DAB & DBA are used to determine the probability that the input data in different cases is from the generated data rather than the real data distribution. The adversarial loss function L2 for backward mapping is shown in equation (7): L2(GBA,DBA,B,A)=Eapdata(a)[logDAB(b)]+Ebpdata(b)[log(1DBA(GBA(b)))] where GBA and DBA are the generative and discriminative models of CycleGAN, respectively, GBA is used to convert the input sketch image domain B into the original natural image GBA(b) that conforms to the domain A, DBA is used to discriminate whether the input image is from the real sketch image domain A or the generated sketch image GBA(b), and both use a backward loss function L2 under the power of the generative adversarial network’s cyclic confrontation to push the data distribution of the generated image GBA(b) pushed to the natural domain A, thus ensuring the ability of the generative model GBA to reconstruct the generated sketch image GBA(a) as the original image A'.

Cyclic consistency loss

The cyclic consistency loss function L3 is used to ensure that the content of the original image in the reconstructed image domain during forward training and backward training remains consistent. The loss function for this process is shown in equation (8): L3(GAB,GBA)=Eapdata(a)[||GBA(GAB(a))a||1]+Ebpdata(b)[||GAB(GBA(b))b||1]

Where, GBA(GAB(a)) is the result of reconstruction of the image GAB(a) generated by GBA the generative model GAB during the forward mapping process, and GAB(GBA(b)) is the result of reconstruction of the image GBA(b) generated by GAB the generative model GBA during the backward mapping process. In this process, the luminance between the input and output images is constrained using L1 regularized loss function.

Constant mapping loss

In the overall network model, the generative models GAB and GBA are used to convert the input images into target domain images. It is not known whether it can actually get the style of conversion, so by inputting the image b of domain B into GAB and the image a of domain A into GBA, the generated result is still the image of the corresponding domain, i.e., GAB(b) ≈ B, GBA(a) ≈ A in order to prove that the generative models GAB and GBA are effective, and the constant mapping loss Lid is used to ensure that the generative models GAB and GBA are able to convert the input image into the target domain image. The formula is shown in equation (9): Lid(GAB,GBA)=Ebpdata(b)[||(GAB(b))b||1]+Eapdata(a)[||(GBA(a))a||1]

Where, GAB(b) is the result of B domain image b output by generative model GAB and GBA(a) is the result of A domain image a output by generative model GBA.

Overall loss function

Based on the above summary, the overall function Lt used in this paper is: Lt(GAB,GBA,DAB,DBA)=L1(GAB,DAB,A,B)+L2(GBA,DBA,B,A)+λ1L3(GAB,GBA)+λ2Lid(GAB,GBA) where λ1 is the weight that controls the importance of the forward and backward mappings of the model, and λ2 is the weight of the constant mapping. A and B are the realistic sketch training samples and the real stylized image training samples used in this paper’s method, respectively. Ultimately, the optimization objective L of the overall model in this paper can be expressed as follows (11): L(GAB,GBA)=argminmaxGDLt(GAB,GBA,DAB,DBA)

By using the above loss functions in the network model of CycleGAN, our method is able to achieve relatively stable stylization effects in the forward mapping of the original sketch image domain A to the sketch style domain B, the backward mapping of the sketch style domain B to the original sketch image domain A, as well as the cyclic consistency mapping of the two, to generate images with salient sketch features, and effectively improve the visual quality of the generated The visual quality of the generated image is effectively improved.

Experimental results
Algorithmic loss comparison

This experiment uses a computer configured with a CPU of 3.42GHz, running memory of 16GB, and a graphics card NVIDIA GTX2080, and uses Pycharm to build a generative adversarial network, incorporating the improved processing module into the generator. Content images are selected from the ImageNet dataset, the Wiki-art dataset is chosen as the style image for the training set, and the remaining images are used as the test set, and all images are resized to 256 × 256 for ease of processing.The model is implemented based on the Pytorch deep learning framework. The network is optimized with small batch gradient descent, setting batchsize=1 and total epoch=200. The experiment is conducted on a Windows operating system, based on the deep learning framework PyTorch. The training time for the original model is about 13 hours, while for the thesis model, it is about 10 hours.

The loss comparison of each model is shown in Fig. 4. Compared to the direct use of other style migration algorithms, the algorithm using CycleGAN in this paper converges faster, which means that the CycleGAN model can grasp the features of the image more quickly. In addition, this approach also allows the model to extract more hidden high-level features from the image, which is why the CycleGAN algorithm has less loss than previous algorithms in a specific range.

Figure 4.

Loss comparison

Comparison of model efficiency

To validate the performance of the algorithm, this chapter compares the proposed algorithm with several mainstream algorithms, including DPST, WCT2, PhotoNAS and PhotoWCT2. The test dataset adopts the PST dataset, which consists of 400 sketch images covering a rich variety of subjects and contents, including portraits, landscapes, still lifes, portraits, and other types. By comprehensively testing these sketch images, a more comprehensive understanding of the model’s adaptability and robustness in various contexts can be achieved.

In this subsection, the algorithmic model is comprehensively tested on three images with different resolutions, namely 1000 × 750, 750 × 562, and 500 × 375.In order to process the data efficiently, we randomly selected 300 images from the PST dataset, of which 150 were used as content images and the other 150 were used as style images. Subsequently, we crop these images to a size of 1000 × 750 and downsample them to two other low resolutions, and finally, we compute the average of these 150 images to get the average time of style migration as shown in Table 1.

Model size and speed comparison

Algorithm Parameter quantity Style migration time (s)
1000×750 750×562 500×375
DualGAN 20.68M 101.66 96.40 88.47
CGGAN 16.92M 55.78 44.61 42.79
StyleGAN 14.55M 90.33 61.15 50.73
AdaIN 8.86M 23.49 22.06 17.95
AdaAttIN 9.04M 156.85 74.74 51.66
IEST 12.01M 14.79 13.16 7.76
LapStyle 12.47M 98.51 94.98 36.83
SANet 20.75M 119.05 116.25 33.49
CCPL 5.69M 11.42 10.46 8.28
SSTR 13.37M 23.29 16.97 12.12
ArtFlow 8.63 M 7.98 6.05 5.55
Ours 1.02 M 3.42 2.19 1.72

Compared to other models, CycleGAN model in this paper has the least number of parameters (20.75M). In addition, this paper’s model shows the fastest migration speed (3.42s, 2.19s, 1.72s) on images of any resolution. The CycleGAN algorithm in this paper can effectively reduce the number of parameters of the model and improve the operation rate of the model.

Image Evaluation
Subjective evaluation

In a sense, the evaluation of the effectiveness of stylized migration of images of sketches is subjective, so in this paper, we conducted a user survey to deeply evaluate the performance of various algorithms in terms of content quality, stylization intensity, and likability. We chose the PST dataset as the baseline dataset, from which 50 sketch images were randomly selected as content images and 50 images as stylized images, which were fed into 12 algorithms. A total of 600 images were shown to 50 testers, who were required to select the three images that best met the criteria from these 600 generated images based on three aspects: content quality, stylization strength and likability. The percentage of outputs of each algorithm that were selected out of a total of 150 responses is shown in Table 2. Compared to the other algorithms, the proposed CycleGAN algorithm outperforms the other algorithmic models in terms of content quality (64.58%), stylization strength (65.73%), and likability (60.09%), and is preferred by the testers by a large margin.

The results of the user study

Algorithm Content quality (%) Style strength (%) Fondness (%)
DualGAN 1.34 2.31 3.79
CGGAN 3.72 2.17 3.59
StyleGAN 4.88 1.33 2.13
AdaIN 1.01 3.04 4.33
AdaAttIN 3.37 3.87 1.81
IEST 2.59 4.28 0.96
LapStyle 3.14 5.84 6.13
SANet 4.69 2.88 4.75
CCPL 3.66 3.89 5.41
SSTR 2.94 0.99 4.75
ArtFlow 4.08 3.67 2.26
Ours 64.58 65.73 60.09
Objective evaluation

In order to validate the effectiveness of the method and objectively evaluate the difference between this paper’s method and other methods, the paper uses structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) for evaluation. The deep learning algorithm is utilized to migrate the sketching styles of two images of original sketching works, and the evaluation results are shown in Table 3 by evaluating the SSIM and PSNR of the images under each style. The migrated styles contain five styles: Rubens, Dürer, Zorn, Mentzel, and Seurat.

Evaluation results

SSIM Rubens Dürer Zorn Menzel Seurat
DualGAN 0.444 0.513 0.191 0.471 0.377
CGGAN 0.433 0.517 0.172 0.458 0.226
StyleGAN 0.474 0.545 0.128 0.398 0.244
AdaIN 0.464 0.552 0.193 0.451 0.371
AdaAttIN 0.478 0.571 0.243 0.405 0.298
IEST 0.452 0.487 0.182 0.431 0.238
LapStyle 0.419 0.503 0.136 0.367 0.355
SANet 0.434 0.486 0.254 0.472 0.281
CCPL 0.409 0.524 0.216 0.401 0.346
SSTR 0.418 0.556 0.214 0.451 0.358
ArtFlow 0.485 0.495 0.242 0.409 0.384
Ours 0.498 0.576 0.257 0.484 0.392
PSNR Rubens Dürer Zorn Menzel Seurat
DualGAN 11.075 12.213 8.684 10.956 10.288
CGGAN 10.863 12.504 7.971 10.406 10.608
StyleGAN 12.343 14.297 7.707 11.523 10.346
AdaIN 12.424 14.153 8.991 12.178 9.584
AdaAttIN 10.507 12.897 8.179 11.574 10.471
IEST 13.572 13.591 8.999 10.275 9.634
LapStyle 10.772 13.091 8.224 10.393 10.974
SANet 13.277 13.841 7.662 13.092 9.917
CCPL 13.108 11.071 8.367 11.243 9.308
SSTR 11.381 13.022 8.426 12.418 10.649
ArtFlow 10.866 13.301 8.027 10.567 9.531
Ours 13.808 14.573 9.087 13.607 11.001

From the data in Table 3, it can be clearly seen that the CycleGAN model used in this paper has SSIM values of 0.498, 0.576, 0.257, 0.484, and 0.392 for the five styles of Rubens, Dürer, Zorn, Mentzel, and Seurat, respectively, and PSNR values of 13.808, 14.573, 9.087, and 13.607, respectively, 11.001, and both metrics reached the maximum value. This means that the image generated by the CycleGAN method is the closest to the original image in terms of structure, and the quality of the image produced by the CycleGAN method in this paper is significantly better compared to other methods. In the comparison of evaluation indexes, it is found that the CycleGAN method of this paper exceeds the comparison methods in all the data of image style transfer. This is not only reflected in the numerical values, but also more intuitively in the visual effect of the generated images, which is more excellent in this paper’s method. Therefore, the method presented in this paper is feasible and effective.

Conclusion

The article constructs a CycleGAN style migration model for sketches by improving the generative adversarial network to migrate sketches into styles, which assists in artistic creation. The effectiveness of this paper’s algorithm is verified by comparing the differences between other style migration algorithms and this paper’s CycleGAN algorithm in terms of algorithmic loss, model efficiency, and subjective and objective evaluations of style migration images.

The algorithm convergence speed of CycleGAN in this paper is faster than other image style migration algorithms. The model run size of this paper is 20.75M, which is the smallest number of parameters among all image-style migration models. On the images of 1000 × 750, 750 × 562 and 500 × 375 resolutions, the running speeds of CycleGAN model in this paper are 3.42s, 2.19s and 1.72s, respectively, and its running efficiency is much faster than that of other models. For subjective evaluation of the sketch images after style migration, the content quality, stylization intensity and fondness of CycleGAN model in this paper are 64.58%, 65.73% and 60.09%, respectively, which are more than 60%, and the best subjective evaluation results are obtained. In objective evaluation, the SSIM values of the five styles of CycleGAN model used in this paper are 0.498, 0.576, 0.257, 0.484, 0.392, and the PSNR values are 13.808, 14.573, 9.087, 13.607, and 11.001, respectively, all of which are the optimal performances among all models.