Deep Learning-based Research on Stylistic Migration and Creative Assistance for Drawing Artworks

From the development of primitive drawing as the basis of painting to creative drawing as a mature and independent work of painting, the research and exploration of the language and art form of contemporary drawing has become open and diverse [1–2]. Creative drawing as a fusion of concepts, aesthetics, forms, techniques, materials and other elements into a single independent work of art as the emerging art of drawing, has a great value for exploration and research. The rise of creative sketching for the development of painting art has a great role in promoting not only the development of creative sketching and the art of drawing itself, but also to make a closer interoperability and integration between the various art disciplines and common development and progress [3–6]. Drawing is very familiar to the public and artists, but for a long time, especially in the development of art to contemporary times, people, especially in China, have some ambiguities about the understanding of drawing and the understanding and awareness of creative drawing. The reason for this problem is that people ignore the important role of “creativity” in the process of drawing and painting [7–9]. The so-called “creativity” refers to the conceptual, concrete and systematic creations that people make by borrowing the art form of sketching. It is important to be clear about what is meant by “the basic nature of drawing (work-in-progress drawing)” and “the creative nature of drawing (creative drawing)”. The formation of this question has led to a new understanding and exploration of the ultimate significance of the artistic language of “sketching” [10–12]. The initial understanding of sketching is the basic modeling training and the artist’s pre-creation sketch, that is, the initial stage of learning and mastering painting, the artist’s pre-creation sketch and partial exploration, and the construction stage from the completion of the entire creative concept, and it is precisely in the initial sketching basic modeling practice and reflection and exploration that he finds his own “creative” language [13–15]. The style migration technique has attracted much attention in the field of computer vision, and this technique can realize the style transformation of images, with a background in various fields such as image processing, artificial intelligence, and computer art. The rise of this research direction stems from the pursuit of novel approaches to image processing and art creation, and is also driven by the continuous development of computer vision and machine learning techniques [16–18]. Research on style migration has driven the changing role of computers in art creation, and has been furthered by the wave of deep learning. The use of deep learning to assist the style migration and creation of drawing artworks can promote the development of innovative thinking of artists, and can also provide a pioneering and learnable innovation model for the development of other painting, and promote the development and prosperity of the art of painting [19–22].

In this paper, Generative Adversarial Network (GAN) in deep learning is used as a substrate to improve the traditional generative adversarial network, and CycleConsistent Generative Adversarial Network (CycleGAN) is chosen as a method for image segmentation migration of sketching works. Optimization and loss function design of the network model are carried out to construct the style migration model of sketch works based on improved GAN. Relevant experiments are designed to compare the algorithmic losses of this paper’s CycleGAN model with those of other image style migration models, compare the number of parameters and running time of each image style migration model, and test the running efficiency of this paper’s CycleGAN model. In addition, subjective and objective evaluations of the style migration sketches generated by each image style migration model are conducted to test the quality of the images generated by this paper’s CycleGAN model.

2

Improved GAN-based style migration model for sketches

2.1

Cyclic Consistent Generative Adversarial Network (CycleGAN)

Zhu et al. first proposed Cycle Consistent Generative Adversarial Network (CycleGAN) to realize the migration between asymmetric data [23–24]. CycleGAN has wider applicability and utility due to its ability to accept unpaired training data and learn the mapping relationship between source and target domains. This ability allows CycleGAN to perform tasks such as image style migration without requiring paired training data. The model is a cyclic adversarial training structure which is an image style migration model based on the idea of pairwise and also consists of two GAN networks that is two generators G and F and two discriminators D_X and D_Y. The CycleGAN network architecture is shown in Fig. 1.The network aims to go for the learning task of image migration from X domain to Y domain and it uses an adversarial loss function to learn the generator G: X → Y such that the The discriminator cannot distinguish the authenticity of the generated image from the original image and the network also has the ability to map the Y domain to the X domain, which is realized by the generator F, i.e., F: Y → X. In addition, the bi-discriminators D_X and D_Y discriminate the migrated image to ensure the quality of the image. In the loss function, the cyclic consistency loss is introduced as a constraint between the two generators. It serves to ensure that the generators can generate images corresponding to the original image, thus preventing the generators from deceiving the discriminator by generating images in the real image domain, ensuring that G and F are inverse to each other, and guaranteeing that the space of the generated images is limited in a certain way, which is of practical significance.

As shown in figure (a) above, firstly the real image P_X of X domain is migrated into a false image P_Y of Y domain through generator G, and then P_Y the image is subjected to the reconstruction operation through generator F, through which the original image feature information is preserved, and in this process it is cyclically consistent, i.e., x → G(x) → F(G(x)) ≈ x and y → F(y) → G(F(y)) ≈ y. As shown in the figure above, (b) is the forward cyclic consistency, and (c) is the reverse cyclic consistency. D_Y is used to determine x whether the image belongs to Y domain, the image x passes through the generator G to generate Y domain image $\hat{P}$ , the generator F reconstructs $\hat{Y}$ into an image $\hat{X}$ that is similar to x, x to be consistent with $\hat{x}$ as much as possible, and vice versa, and the distance between the two of them is the loss of cyclic consistency. The adversarial loss function in Fig. (b) is as follows: 1 $L_{G A N} (F, D_{X}, X, Y) = E_{y \sim p_{d a t a} (y)} [\log D_{Y} (y)] = + E_{x \sim p_{d a t a} (x)} [\log (1 - D_{Y} (G (x))]$

The adversarial loss function in Fig. (c) is as follows: 2 $L_{G A N} (F, D_{X}, X, Y) = E_{x \sim p_{d a t a} (x)} [\log D_{X} (x)] = + E_{y \sim p_{d a t a} (y)} [\log (1 - D_{X} (F (y))]$

The cyclic consistency loss function is given in the following equation, using a L₁ loss: 3 $L_{c y c} (G, F) = E_{x \sim p_{d a t a} (x)} [| | F (G (x) - x | |_{1})] + E_{y \sim p_{d a t a} (y)} [| | G (F (x) - x | |_{1})]$

The resulting CycleGAN total loss function can be expressed as: 4 $L (G, F, D_{X}, D_{Y}) = L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X) + λ L_{C Y C} (G, F)$ where λ controls the importance of the correlation between two objects, and we aim to go for a max-min problem similar to the original GAN: 5 $G^{*}, F^{*} = a r g \min_{G, F} \max_{D_{X}, D_{Y}} L (G, F, D_{X}, D_{Y})$

2.2

Generator and Discriminator Architecture

2.2.1.

Generator Architecture

A generator takes random noise as input and produces synthetic samples that resemble real training data [25]. A generator usually consists of one or more deep neural networks, often using convolutional layers to generate images or recurrent layers to generate sequential data. The output of the generator is fed to a discriminator, which is then trained to distinguish the generated samples from the real training data. The generator architecture is shown in Figure 2.

The generator is a key building block in the CycleGAN architecture, and understanding its role and structure is critical to understanding the CycleGAN training process. The generator architecture consists of three components: the potential space, the generator, and the image generation part. The generator samples from the potential space and establishes a relationship between the potential space and the output. We then create a neural network that maps from the input (potential space) to the output (most examples are images). During adversarial training, we connect the generator and discriminator together in the model and train the generator to generate images that are indistinguishable from real images. Ultimately, the generator produces the output image we see after the entire training process. Training CycleGAN focuses exclusively on training the generator, while in most architectures, the discriminator requires several epochs of training before starting the training process.

Each component of the CycleGAN architecture is defined as a class, and the generator class has three main functions: the class template, the loss function, and the buildModel function. The loss function is a custom loss function that is used to train the model when needed, while the buildModel function builds the actual neural network model. Model-specific training sequences will be included in this class, although we may only use internal training methods for discriminators.

2.2.2

Discriminator Architecture

The discriminator in the CycleGAN architecture acts as a deep neural network that distinguishes between real and fake images by generating scalar values between 0 and 1 (indicating the probability that the input is real). It is trained to be an accurate binary classifier, minimizing the cross-entropy loss between its predicted and true labels. The architecture of the discriminator usually includes a convolutional neural network CNN and is trained using both real and generated datasets to balance its training with the generator.

The discriminator is an important part of the CycleGAN architecture as it acts as an adaptive loss function, learning and adapting to the underlying distribution of the data rather than relying on heuristic techniques [26]. It evaluates the veracity of real and generated images and gradually learns to distinguish between them, thus allowing the generator to generate new, previously unseen data from the latent space. The generators are trained to minimize the logarithmic loss in the output of the discriminator that generates the samples, aiming to produce realistic images while minimizing the difference between the generated data and the real training data.The training process of CycleGAN involves iteratively training the generator and the discriminator in an adversarial manner until convergence, thus generating new data that is similar to the training data. The discriminator architecture is shown in Figure 3.

2.3

Network model optimization

Based on the training process of the underlying network model, the loss function design used in this paper includes a forward mapping loss L₁ for mapping the original sketch image domain A to the sketch style domain B in the model, a backward mapping loss L₂ from the sketch style domain B to the original sketch image domain A, a cyclic consistency loss L₃ between the two, and a constant mapping loss L_id used to ensure that the generative model acts. 1)

Forward mapping loss

The loss function of forward mapping L₁ makes the data distribution of the generated image G_AB(a) match the data distribution of the target domain image B as much as possible under the power of CycleGAN’s generative and discriminative models’ cyclic confrontation, which enhances the conversion of the generated image to the sketching target style. The process loss function L₁ is shown in equation (6): 6 $L_{1} (G_{A B}, D_{A B}, A, B) = E_{b \sim p_{d a t a} (b)} [\log D_{A B} (b)] + E_{a \sim p_{d a t a} (a)} [\log (1 - D_{A B} (G_{A B} (a)))]$

Where a and b are training samples that satisfy the data distribution of original sketch images and sketch style migration images, respectively, G_AB and D_AB are the generative and discriminative models of CycleGAN. G_AB is used to convert the input natural image domain A into a sketch style migrated image G_AB(a) that satisfies the style of domain B, D_AB is used to discriminate whether the input image is from a real sketch image B or a generated sketch image G_AB(a), both of which are optimized under the power of the generative adversarial network CycleAdversarial to optimize the generated image so that it pushes the data distribution of the generated image G_AB(a) to the sketch domain B under the action of L₁.

2)

Backward mapping loss

In the whole network model, in addition to the need for the generative model G_AB to synthesize the original sketch image into a sketch style migration image, the generated sketch image G_AB(a) also needs to be reconstructed using the generative model G_BA to ensure that the generated image has a sketch style without losing the semantic information of the original image. Therefore, a backward training process of synthesizing the sketch image domain B into the original natural image domain A is required to enhance the ability of the generative model G_BA to synthesize the original natural image. In this process, the generative model is used twice with the process by a → G_AB → b & b → G_BA → a, while here the discriminative models D_AB & D_BA are used to determine the probability that the input data in different cases is from the generated data rather than the real data distribution. The adversarial loss function L₂ for backward mapping is shown in equation (7): 7 $L_{2} (G_{B A}, D_{B A}, B, A) = E_{a \sim p_{d a t a} (a)} [\log D_{A B} (b)] + E_{b \sim p_{d a t a} (b)} [\log (1 - D_{B A} (G_{B A} (b)))]$ where G_BA and D_BA are the generative and discriminative models of CycleGAN, respectively, G_BA is used to convert the input sketch image domain B into the original natural image G_BA(b) that conforms to the domain A, D_BA is used to discriminate whether the input image is from the real sketch image domain A or the generated sketch image G_BA(b), and both use a backward loss function L₂ under the power of the generative adversarial network’s cyclic confrontation to push the data distribution of the generated image G_BA(b) pushed to the natural domain A, thus ensuring the ability of the generative model G_BA to reconstruct the generated sketch image G_BA(a) as the original image A'.

3)

Cyclic consistency loss

The cyclic consistency loss function L₃ is used to ensure that the content of the original image in the reconstructed image domain during forward training and backward training remains consistent. The loss function for this process is shown in equation (8): 8 $L_{3} (G_{A B}, G_{B A}) = E_{a \sim p_{d a t a} (a)} [| | G_{B A} (G_{A B} (a)) - a | |_{1}] + E_{b \sim p_{d a t a} (b)} [| | G_{A B} (G_{B A} (b)) - b | |_{1}]$

Where, G_BA(G_AB(a)) is the result of reconstruction of the image G_AB(a) generated by G_BA the generative model G_AB during the forward mapping process, and G_AB(G_BA(b)) is the result of reconstruction of the image G_BA(b) generated by G_AB the generative model G_BA during the backward mapping process. In this process, the luminance between the input and output images is constrained using L₁ regularized loss function.

4)

Constant mapping loss

In the overall network model, the generative models G_AB and G_BA are used to convert the input images into target domain images. It is not known whether it can actually get the style of conversion, so by inputting the image b of domain B into G_AB and the image a of domain A into G_BA, the generated result is still the image of the corresponding domain, i.e., G_AB(b) ≈ B, G_BA(a) ≈ A in order to prove that the generative models G_AB and G_BA are effective, and the constant mapping loss L_id is used to ensure that the generative models G_AB and G_BA are able to convert the input image into the target domain image. The formula is shown in equation (9): 9 $L_{i d} (G_{A B}, G_{B A}) = E_{b \sim p_{d a t a} (b)} [| | (G_{A B} (b)) - b | |_{1}] + E_{a \sim p_{d a t a} (a)} [| | (G_{B A} (a)) - a | |_{1}]$

Where, G_AB(b) is the result of B domain image b output by generative model G_AB and G_BA(a) is the result of A domain image a output by generative model G_BA.

5)

Overall loss function

Based on the above summary, the overall function L_t used in this paper is: 10 $\begin{array}{l} L_{t} (G_{A B}, G_{B A}, D_{A B}, D_{B A}) = L_{1} (G_{A B}, D_{A B}, A, B) \\ + L_{2} (G_{B A}, D_{B A}, B, A) + λ_{1} L_{3} (G_{A B}, G_{B A}) + λ_{2} L_{i d} (G_{A B}, G_{B A}) \end{array}$ where λ₁ is the weight that controls the importance of the forward and backward mappings of the model, and λ₂ is the weight of the constant mapping. A and B are the realistic sketch training samples and the real stylized image training samples used in this paper’s method, respectively. Ultimately, the optimization objective L of the overall model in this paper can be expressed as follows (11): 11 $L (G_{A B}, G_{B A}) = \begin{matrix} \arg \min & \max \\ G & D \end{matrix} L_{t} (G_{A B}, G_{B A}, D_{A B}, D_{B A})$

By using the above loss functions in the network model of CycleGAN, our method is able to achieve relatively stable stylization effects in the forward mapping of the original sketch image domain A to the sketch style domain B, the backward mapping of the sketch style domain B to the original sketch image domain A, as well as the cyclic consistency mapping of the two, to generate images with salient sketch features, and effectively improve the visual quality of the generated The visual quality of the generated image is effectively improved.

3

Experimental results

3.1

Algorithmic loss comparison

This experiment uses a computer configured with a CPU of 3.42GHz, running memory of 16GB, and a graphics card NVIDIA GTX2080, and uses Pycharm to build a generative adversarial network, incorporating the improved processing module into the generator. Content images are selected from the ImageNet dataset, the Wiki-art dataset is chosen as the style image for the training set, and the remaining images are used as the test set, and all images are resized to 256 × 256 for ease of processing.The model is implemented based on the Pytorch deep learning framework. The network is optimized with small batch gradient descent, setting batchsize=1 and total epoch=200. The experiment is conducted on a Windows operating system, based on the deep learning framework PyTorch. The training time for the original model is about 13 hours, while for the thesis model, it is about 10 hours.

The loss comparison of each model is shown in Fig. 4. Compared to the direct use of other style migration algorithms, the algorithm using CycleGAN in this paper converges faster, which means that the CycleGAN model can grasp the features of the image more quickly. In addition, this approach also allows the model to extract more hidden high-level features from the image, which is why the CycleGAN algorithm has less loss than previous algorithms in a specific range.

3.2

Comparison of model efficiency

To validate the performance of the algorithm, this chapter compares the proposed algorithm with several mainstream algorithms, including DPST, WCT2, PhotoNAS and PhotoWCT2. The test dataset adopts the PST dataset, which consists of 400 sketch images covering a rich variety of subjects and contents, including portraits, landscapes, still lifes, portraits, and other types. By comprehensively testing these sketch images, a more comprehensive understanding of the model’s adaptability and robustness in various contexts can be achieved.

In this subsection, the algorithmic model is comprehensively tested on three images with different resolutions, namely 1000 × 750, 750 × 562, and 500 × 375.In order to process the data efficiently, we randomly selected 300 images from the PST dataset, of which 150 were used as content images and the other 150 were used as style images. Subsequently, we crop these images to a size of 1000 × 750 and downsample them to two other low resolutions, and finally, we compute the average of these 150 images to get the average time of style migration as shown in Table 1.

Table 1.

Model size and speed comparison

Algorithm	Parameter quantity	Style migration time (s)
Algorithm	Parameter quantity	1000×750	750×562	500×375
DualGAN	20.68M	101.66	96.40	88.47
CGGAN	16.92M	55.78	44.61	42.79
StyleGAN	14.55M	90.33	61.15	50.73
AdaIN	8.86M	23.49	22.06	17.95
AdaAttIN	9.04M	156.85	74.74	51.66
IEST	12.01M	14.79	13.16	7.76
LapStyle	12.47M	98.51	94.98	36.83
SANet	20.75M	119.05	116.25	33.49
CCPL	5.69M	11.42	10.46	8.28
SSTR	13.37M	23.29	16.97	12.12
ArtFlow	8.63 M	7.98	6.05	5.55
Ours	1.02 M	3.42	2.19	1.72

Compared to other models, CycleGAN model in this paper has the least number of parameters (20.75M). In addition, this paper’s model shows the fastest migration speed (3.42s, 2.19s, 1.72s) on images of any resolution. The CycleGAN algorithm in this paper can effectively reduce the number of parameters of the model and improve the operation rate of the model.

3.3

Image Evaluation

3.3.1

Subjective evaluation

In a sense, the evaluation of the effectiveness of stylized migration of images of sketches is subjective, so in this paper, we conducted a user survey to deeply evaluate the performance of various algorithms in terms of content quality, stylization intensity, and likability. We chose the PST dataset as the baseline dataset, from which 50 sketch images were randomly selected as content images and 50 images as stylized images, which were fed into 12 algorithms. A total of 600 images were shown to 50 testers, who were required to select the three images that best met the criteria from these 600 generated images based on three aspects: content quality, stylization strength and likability. The percentage of outputs of each algorithm that were selected out of a total of 150 responses is shown in Table 2. Compared to the other algorithms, the proposed CycleGAN algorithm outperforms the other algorithmic models in terms of content quality (64.58%), stylization strength (65.73%), and likability (60.09%), and is preferred by the testers by a large margin.

Table 2.

The results of the user study

Algorithm	Content quality (%)	Style strength (%)	Fondness (%)
DualGAN	1.34	2.31	3.79
CGGAN	3.72	2.17	3.59
StyleGAN	4.88	1.33	2.13
AdaIN	1.01	3.04	4.33
AdaAttIN	3.37	3.87	1.81
IEST	2.59	4.28	0.96
LapStyle	3.14	5.84	6.13
SANet	4.69	2.88	4.75
CCPL	3.66	3.89	5.41
SSTR	2.94	0.99	4.75
ArtFlow	4.08	3.67	2.26
Ours	64.58	65.73	60.09

3.3.2

Objective evaluation

In order to validate the effectiveness of the method and objectively evaluate the difference between this paper’s method and other methods, the paper uses structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) for evaluation. The deep learning algorithm is utilized to migrate the sketching styles of two images of original sketching works, and the evaluation results are shown in Table 3 by evaluating the SSIM and PSNR of the images under each style. The migrated styles contain five styles: Rubens, Dürer, Zorn, Mentzel, and Seurat.

Table 3.

Evaluation results

SSIM	Rubens	Dürer	Zorn	Menzel	Seurat
DualGAN	0.444	0.513	0.191	0.471	0.377
CGGAN	0.433	0.517	0.172	0.458	0.226
StyleGAN	0.474	0.545	0.128	0.398	0.244
AdaIN	0.464	0.552	0.193	0.451	0.371
AdaAttIN	0.478	0.571	0.243	0.405	0.298
IEST	0.452	0.487	0.182	0.431	0.238
LapStyle	0.419	0.503	0.136	0.367	0.355
SANet	0.434	0.486	0.254	0.472	0.281
CCPL	0.409	0.524	0.216	0.401	0.346
SSTR	0.418	0.556	0.214	0.451	0.358
ArtFlow	0.485	0.495	0.242	0.409	0.384
Ours	0.498	0.576	0.257	0.484	0.392
PSNR	Rubens	Dürer	Zorn	Menzel	Seurat
DualGAN	11.075	12.213	8.684	10.956	10.288
CGGAN	10.863	12.504	7.971	10.406	10.608
StyleGAN	12.343	14.297	7.707	11.523	10.346
AdaIN	12.424	14.153	8.991	12.178	9.584
AdaAttIN	10.507	12.897	8.179	11.574	10.471
IEST	13.572	13.591	8.999	10.275	9.634
LapStyle	10.772	13.091	8.224	10.393	10.974
SANet	13.277	13.841	7.662	13.092	9.917
CCPL	13.108	11.071	8.367	11.243	9.308
SSTR	11.381	13.022	8.426	12.418	10.649
ArtFlow	10.866	13.301	8.027	10.567	9.531
Ours	13.808	14.573	9.087	13.607	11.001

From the data in Table 3, it can be clearly seen that the CycleGAN model used in this paper has SSIM values of 0.498, 0.576, 0.257, 0.484, and 0.392 for the five styles of Rubens, Dürer, Zorn, Mentzel, and Seurat, respectively, and PSNR values of 13.808, 14.573, 9.087, and 13.607, respectively, 11.001, and both metrics reached the maximum value. This means that the image generated by the CycleGAN method is the closest to the original image in terms of structure, and the quality of the image produced by the CycleGAN method in this paper is significantly better compared to other methods. In the comparison of evaluation indexes, it is found that the CycleGAN method of this paper exceeds the comparison methods in all the data of image style transfer. This is not only reflected in the numerical values, but also more intuitively in the visual effect of the generated images, which is more excellent in this paper’s method. Therefore, the method presented in this paper is feasible and effective.

4

Conclusion

The article constructs a CycleGAN style migration model for sketches by improving the generative adversarial network to migrate sketches into styles, which assists in artistic creation. The effectiveness of this paper’s algorithm is verified by comparing the differences between other style migration algorithms and this paper’s CycleGAN algorithm in terms of algorithmic loss, model efficiency, and subjective and objective evaluations of style migration images.

The algorithm convergence speed of CycleGAN in this paper is faster than other image style migration algorithms. The model run size of this paper is 20.75M, which is the smallest number of parameters among all image-style migration models. On the images of 1000 × 750, 750 × 562 and 500 × 375 resolutions, the running speeds of CycleGAN model in this paper are 3.42s, 2.19s and 1.72s, respectively, and its running efficiency is much faster than that of other models. For subjective evaluation of the sketch images after style migration, the content quality, stylization intensity and fondness of CycleGAN model in this paper are 64.58%, 65.73% and 60.09%, respectively, which are more than 60%, and the best subjective evaluation results are obtained. In objective evaluation, the SSIM values of the five styles of CycleGAN model used in this paper are 0.498, 0.576, 0.257, 0.484, 0.392, and the PSNR values are 13.808, 14.573, 9.087, 13.607, and 11.001, respectively, all of which are the optimal performances among all models.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

Deep Learning-based Research on Stylistic Migration and Creative Assistance for Drawing Artworks

Chao Jiang

Manqiu Xu

Publicado en línea: 19 mar 2025

Recibido: 31 oct 2024

Aceptado: 13 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0495

Palabras claveDeep learning, Generative adversarial networks, CycleGAN, Style migration

© 2025 Chao Jiang, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
Deep learning, Generative adversarial networks, CycleGAN, Style migration