Deep Learning-based Research on Stylistic Migration and Creative Assistance for Drawing Artworks
Pubblicato online: 19 mar 2025
Ricevuto: 31 ott 2024
Accettato: 13 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0495
Parole chiave
© 2025 Chao Jiang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
From the development of primitive drawing as the basis of painting to creative drawing as a mature and independent work of painting, the research and exploration of the language and art form of contemporary drawing has become open and diverse [1–2]. Creative drawing as a fusion of concepts, aesthetics, forms, techniques, materials and other elements into a single independent work of art as the emerging art of drawing, has a great value for exploration and research. The rise of creative sketching for the development of painting art has a great role in promoting not only the development of creative sketching and the art of drawing itself, but also to make a closer interoperability and integration between the various art disciplines and common development and progress [3–6]. Drawing is very familiar to the public and artists, but for a long time, especially in the development of art to contemporary times, people, especially in China, have some ambiguities about the understanding of drawing and the understanding and awareness of creative drawing. The reason for this problem is that people ignore the important role of “creativity” in the process of drawing and painting [7–9]. The so-called “creativity” refers to the conceptual, concrete and systematic creations that people make by borrowing the art form of sketching. It is important to be clear about what is meant by “the basic nature of drawing (work-in-progress drawing)” and “the creative nature of drawing (creative drawing)”. The formation of this question has led to a new understanding and exploration of the ultimate significance of the artistic language of “sketching” [10–12]. The initial understanding of sketching is the basic modeling training and the artist’s pre-creation sketch, that is, the initial stage of learning and mastering painting, the artist’s pre-creation sketch and partial exploration, and the construction stage from the completion of the entire creative concept, and it is precisely in the initial sketching basic modeling practice and reflection and exploration that he finds his own “creative” language [13–15]. The style migration technique has attracted much attention in the field of computer vision, and this technique can realize the style transformation of images, with a background in various fields such as image processing, artificial intelligence, and computer art. The rise of this research direction stems from the pursuit of novel approaches to image processing and art creation, and is also driven by the continuous development of computer vision and machine learning techniques [16–18]. Research on style migration has driven the changing role of computers in art creation, and has been furthered by the wave of deep learning. The use of deep learning to assist the style migration and creation of drawing artworks can promote the development of innovative thinking of artists, and can also provide a pioneering and learnable innovation model for the development of other painting, and promote the development and prosperity of the art of painting [19–22].
In this paper, Generative Adversarial Network (GAN) in deep learning is used as a substrate to improve the traditional generative adversarial network, and CycleConsistent Generative Adversarial Network (CycleGAN) is chosen as a method for image segmentation migration of sketching works. Optimization and loss function design of the network model are carried out to construct the style migration model of sketch works based on improved GAN. Relevant experiments are designed to compare the algorithmic losses of this paper’s CycleGAN model with those of other image style migration models, compare the number of parameters and running time of each image style migration model, and test the running efficiency of this paper’s CycleGAN model. In addition, subjective and objective evaluations of the style migration sketches generated by each image style migration model are conducted to test the quality of the images generated by this paper’s CycleGAN model.
Zhu et al. first proposed Cycle Consistent Generative Adversarial Network (CycleGAN) to realize the migration between asymmetric data [23–24]. CycleGAN has wider applicability and utility due to its ability to accept unpaired training data and learn the mapping relationship between source and target domains. This ability allows CycleGAN to perform tasks such as image style migration without requiring paired training data. The model is a cyclic adversarial training structure which is an image style migration model based on the idea of pairwise and also consists of two GAN networks that is two generators G and F and two discriminators

CycleGAN network architecture
As shown in figure (a) above, firstly the real image
The adversarial loss function in Fig. (c) is as follows:
The cyclic consistency loss function is given in the following equation, using a
The resulting CycleGAN total loss function can be expressed as:
A generator takes random noise as input and produces synthetic samples that resemble real training data [25]. A generator usually consists of one or more deep neural networks, often using convolutional layers to generate images or recurrent layers to generate sequential data. The output of the generator is fed to a discriminator, which is then trained to distinguish the generated samples from the real training data. The generator architecture is shown in Figure 2.

Generator architecture
The generator is a key building block in the CycleGAN architecture, and understanding its role and structure is critical to understanding the CycleGAN training process. The generator architecture consists of three components: the potential space, the generator, and the image generation part. The generator samples from the potential space and establishes a relationship between the potential space and the output. We then create a neural network that maps from the input (potential space) to the output (most examples are images). During adversarial training, we connect the generator and discriminator together in the model and train the generator to generate images that are indistinguishable from real images. Ultimately, the generator produces the output image we see after the entire training process. Training CycleGAN focuses exclusively on training the generator, while in most architectures, the discriminator requires several epochs of training before starting the training process.
Each component of the CycleGAN architecture is defined as a class, and the generator class has three main functions: the class template, the loss function, and the buildModel function. The loss function is a custom loss function that is used to train the model when needed, while the buildModel function builds the actual neural network model. Model-specific training sequences will be included in this class, although we may only use internal training methods for discriminators.
The discriminator in the CycleGAN architecture acts as a deep neural network that distinguishes between real and fake images by generating scalar values between 0 and 1 (indicating the probability that the input is real). It is trained to be an accurate binary classifier, minimizing the cross-entropy loss between its predicted and true labels. The architecture of the discriminator usually includes a convolutional neural network CNN and is trained using both real and generated datasets to balance its training with the generator.
The discriminator is an important part of the CycleGAN architecture as it acts as an adaptive loss function, learning and adapting to the underlying distribution of the data rather than relying on heuristic techniques [26]. It evaluates the veracity of real and generated images and gradually learns to distinguish between them, thus allowing the generator to generate new, previously unseen data from the latent space. The generators are trained to minimize the logarithmic loss in the output of the discriminator that generates the samples, aiming to produce realistic images while minimizing the difference between the generated data and the real training data.The training process of CycleGAN involves iteratively training the generator and the discriminator in an adversarial manner until convergence, thus generating new data that is similar to the training data. The discriminator architecture is shown in Figure 3.

Discriminator architecture
Based on the training process of the underlying network model, the loss function design used in this paper includes a forward mapping loss Forward mapping loss The loss function of forward mapping Where Backward mapping loss In the whole network model, in addition to the need for the generative model Cyclic consistency loss The cyclic consistency loss function Where, Constant mapping loss In the overall network model, the generative models Where, Overall loss function Based on the above summary, the overall function By using the above loss functions in the network model of CycleGAN, our method is able to achieve relatively stable stylization effects in the forward mapping of the original sketch image domain
This experiment uses a computer configured with a CPU of 3.42GHz, running memory of 16GB, and a graphics card NVIDIA GTX2080, and uses Pycharm to build a generative adversarial network, incorporating the improved processing module into the generator. Content images are selected from the ImageNet dataset, the Wiki-art dataset is chosen as the style image for the training set, and the remaining images are used as the test set, and all images are resized to 256 × 256 for ease of processing.The model is implemented based on the Pytorch deep learning framework. The network is optimized with small batch gradient descent, setting batchsize=1 and total epoch=200. The experiment is conducted on a Windows operating system, based on the deep learning framework PyTorch. The training time for the original model is about 13 hours, while for the thesis model, it is about 10 hours.
The loss comparison of each model is shown in Fig. 4. Compared to the direct use of other style migration algorithms, the algorithm using CycleGAN in this paper converges faster, which means that the CycleGAN model can grasp the features of the image more quickly. In addition, this approach also allows the model to extract more hidden high-level features from the image, which is why the CycleGAN algorithm has less loss than previous algorithms in a specific range.

Loss comparison
To validate the performance of the algorithm, this chapter compares the proposed algorithm with several mainstream algorithms, including DPST, WCT2, PhotoNAS and PhotoWCT2. The test dataset adopts the PST dataset, which consists of 400 sketch images covering a rich variety of subjects and contents, including portraits, landscapes, still lifes, portraits, and other types. By comprehensively testing these sketch images, a more comprehensive understanding of the model’s adaptability and robustness in various contexts can be achieved.
In this subsection, the algorithmic model is comprehensively tested on three images with different resolutions, namely 1000 × 750, 750 × 562, and 500 × 375.In order to process the data efficiently, we randomly selected 300 images from the PST dataset, of which 150 were used as content images and the other 150 were used as style images. Subsequently, we crop these images to a size of 1000 × 750 and downsample them to two other low resolutions, and finally, we compute the average of these 150 images to get the average time of style migration as shown in Table 1.
Model size and speed comparison
Algorithm | Parameter quantity | Style migration time (s) | ||
---|---|---|---|---|
1000×750 | 750×562 | 500×375 | ||
DualGAN | 20.68M | 101.66 | 96.40 | 88.47 |
CGGAN | 16.92M | 55.78 | 44.61 | 42.79 |
StyleGAN | 14.55M | 90.33 | 61.15 | 50.73 |
AdaIN | 8.86M | 23.49 | 22.06 | 17.95 |
AdaAttIN | 9.04M | 156.85 | 74.74 | 51.66 |
IEST | 12.01M | 14.79 | 13.16 | 7.76 |
LapStyle | 12.47M | 98.51 | 94.98 | 36.83 |
SANet | 20.75M | 119.05 | 116.25 | 33.49 |
CCPL | 5.69M | 11.42 | 10.46 | 8.28 |
SSTR | 13.37M | 23.29 | 16.97 | 12.12 |
ArtFlow | 8.63 M | 7.98 | 6.05 | 5.55 |
Ours | 1.02 M | 3.42 | 2.19 | 1.72 |
Compared to other models, CycleGAN model in this paper has the least number of parameters (20.75M). In addition, this paper’s model shows the fastest migration speed (3.42s, 2.19s, 1.72s) on images of any resolution. The CycleGAN algorithm in this paper can effectively reduce the number of parameters of the model and improve the operation rate of the model.
In a sense, the evaluation of the effectiveness of stylized migration of images of sketches is subjective, so in this paper, we conducted a user survey to deeply evaluate the performance of various algorithms in terms of content quality, stylization intensity, and likability. We chose the PST dataset as the baseline dataset, from which 50 sketch images were randomly selected as content images and 50 images as stylized images, which were fed into 12 algorithms. A total of 600 images were shown to 50 testers, who were required to select the three images that best met the criteria from these 600 generated images based on three aspects: content quality, stylization strength and likability. The percentage of outputs of each algorithm that were selected out of a total of 150 responses is shown in Table 2. Compared to the other algorithms, the proposed CycleGAN algorithm outperforms the other algorithmic models in terms of content quality (64.58%), stylization strength (65.73%), and likability (60.09%), and is preferred by the testers by a large margin.
The results of the user study
Algorithm | Content quality (%) | Style strength (%) | Fondness (%) |
---|---|---|---|
DualGAN | 1.34 | 2.31 | 3.79 |
CGGAN | 3.72 | 2.17 | 3.59 |
StyleGAN | 4.88 | 1.33 | 2.13 |
AdaIN | 1.01 | 3.04 | 4.33 |
AdaAttIN | 3.37 | 3.87 | 1.81 |
IEST | 2.59 | 4.28 | 0.96 |
LapStyle | 3.14 | 5.84 | 6.13 |
SANet | 4.69 | 2.88 | 4.75 |
CCPL | 3.66 | 3.89 | 5.41 |
SSTR | 2.94 | 0.99 | 4.75 |
ArtFlow | 4.08 | 3.67 | 2.26 |
Ours | 64.58 | 65.73 | 60.09 |
In order to validate the effectiveness of the method and objectively evaluate the difference between this paper’s method and other methods, the paper uses structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) for evaluation. The deep learning algorithm is utilized to migrate the sketching styles of two images of original sketching works, and the evaluation results are shown in Table 3 by evaluating the SSIM and PSNR of the images under each style. The migrated styles contain five styles: Rubens, Dürer, Zorn, Mentzel, and Seurat.
Evaluation results
SSIM | Rubens | Dürer | Zorn | Menzel | Seurat |
---|---|---|---|---|---|
DualGAN | 0.444 | 0.513 | 0.191 | 0.471 | 0.377 |
CGGAN | 0.433 | 0.517 | 0.172 | 0.458 | 0.226 |
StyleGAN | 0.474 | 0.545 | 0.128 | 0.398 | 0.244 |
AdaIN | 0.464 | 0.552 | 0.193 | 0.451 | 0.371 |
AdaAttIN | 0.478 | 0.571 | 0.243 | 0.405 | 0.298 |
IEST | 0.452 | 0.487 | 0.182 | 0.431 | 0.238 |
LapStyle | 0.419 | 0.503 | 0.136 | 0.367 | 0.355 |
SANet | 0.434 | 0.486 | 0.254 | 0.472 | 0.281 |
CCPL | 0.409 | 0.524 | 0.216 | 0.401 | 0.346 |
SSTR | 0.418 | 0.556 | 0.214 | 0.451 | 0.358 |
ArtFlow | 0.485 | 0.495 | 0.242 | 0.409 | 0.384 |
Ours | 0.498 | 0.576 | 0.257 | 0.484 | 0.392 |
PSNR | Rubens | Dürer | Zorn | Menzel | Seurat |
DualGAN | 11.075 | 12.213 | 8.684 | 10.956 | 10.288 |
CGGAN | 10.863 | 12.504 | 7.971 | 10.406 | 10.608 |
StyleGAN | 12.343 | 14.297 | 7.707 | 11.523 | 10.346 |
AdaIN | 12.424 | 14.153 | 8.991 | 12.178 | 9.584 |
AdaAttIN | 10.507 | 12.897 | 8.179 | 11.574 | 10.471 |
IEST | 13.572 | 13.591 | 8.999 | 10.275 | 9.634 |
LapStyle | 10.772 | 13.091 | 8.224 | 10.393 | 10.974 |
SANet | 13.277 | 13.841 | 7.662 | 13.092 | 9.917 |
CCPL | 13.108 | 11.071 | 8.367 | 11.243 | 9.308 |
SSTR | 11.381 | 13.022 | 8.426 | 12.418 | 10.649 |
ArtFlow | 10.866 | 13.301 | 8.027 | 10.567 | 9.531 |
Ours | 13.808 | 14.573 | 9.087 | 13.607 | 11.001 |
From the data in Table 3, it can be clearly seen that the CycleGAN model used in this paper has SSIM values of 0.498, 0.576, 0.257, 0.484, and 0.392 for the five styles of Rubens, Dürer, Zorn, Mentzel, and Seurat, respectively, and PSNR values of 13.808, 14.573, 9.087, and 13.607, respectively, 11.001, and both metrics reached the maximum value. This means that the image generated by the CycleGAN method is the closest to the original image in terms of structure, and the quality of the image produced by the CycleGAN method in this paper is significantly better compared to other methods. In the comparison of evaluation indexes, it is found that the CycleGAN method of this paper exceeds the comparison methods in all the data of image style transfer. This is not only reflected in the numerical values, but also more intuitively in the visual effect of the generated images, which is more excellent in this paper’s method. Therefore, the method presented in this paper is feasible and effective.
The article constructs a CycleGAN style migration model for sketches by improving the generative adversarial network to migrate sketches into styles, which assists in artistic creation. The effectiveness of this paper’s algorithm is verified by comparing the differences between other style migration algorithms and this paper’s CycleGAN algorithm in terms of algorithmic loss, model efficiency, and subjective and objective evaluations of style migration images.
The algorithm convergence speed of CycleGAN in this paper is faster than other image style migration algorithms. The model run size of this paper is 20.75M, which is the smallest number of parameters among all image-style migration models. On the images of 1000 × 750, 750 × 562 and 500 × 375 resolutions, the running speeds of CycleGAN model in this paper are 3.42s, 2.19s and 1.72s, respectively, and its running efficiency is much faster than that of other models. For subjective evaluation of the sketch images after style migration, the content quality, stylization intensity and fondness of CycleGAN model in this paper are 64.58%, 65.73% and 60.09%, respectively, which are more than 60%, and the best subjective evaluation results are obtained. In objective evaluation, the SSIM values of the five styles of CycleGAN model used in this paper are 0.498, 0.576, 0.257, 0.484, 0.392, and the PSNR values are 13.808, 14.573, 9.087, 13.607, and 11.001, respectively, all of which are the optimal performances among all models.