A study on the visual effect and user response of infomercials based on neural network analysis

With the development and wide utilization of electronic information media, people’s lives are more and more dependent on electronic media, which also affects the development of the advertising industry to a large extent, and there is no doubt about the powerful function of electronic information media as the main facility of modern advertising dissemination [1]. Although advertisers are utilizing different Internet platforms to place an increasing amount of infomercials, the information delivered to users through the advertisements may not necessarily increase exponentially [2–3]. Therefore, it is necessary to study how infomercials can better stimulate users to pay attention to browsing and improve the effect of advertisement dissemination and economic benefits.

Infomercial refers to a form of advertisement that exists in social media, information media or audiovisual media, and the common types are open screen advertisement, native advertisement, incentive advertisement and video advertisement [4–5]. Relying on big data technology, infomercials can be accurately delivered according to users’ reading habits and software usage habits, and embedded in the information content that users browse, subscribe to and pay attention to on a daily basis [6–7]. Visual marketing theory suggests that content that is closely associated with numbers and symbols will always attract more attention from users in general [8–9]. Infomercials produce strong visual effects to attract the visual attention of target audiences by emphasizing the design of tabs, texts, etc., so as to achieve the purpose of advertising [10–12]. In addition, by virtue of its unique form of communication, infomercials can enable the advertisements to provide valuable information to users without destroying their experience, reducing their perception of the intrusiveness of the advertisements and enhancing the smoothness of their information processing [13–15]. Based on this, we further study the enhancement path of information flow ads in terms of visual effect and user response, in order to provide relevant theoretical support for the improvement of the placement effect of information flow ads.

The most representative features of infomercials are originality and proactivity, and originality is the ability of infomercials to capture users’ consumption characteristics and effectively integrate into infomercials after reasonable design.Alamäki, A. et al. studied the factors related to improving the efficiency of triggering consumers’ behavioral intentions in mobile marketing video advertisements, and although the media-richness of the videos did trigger consumers’ behavioral changes, there were There are differences in the triggering mechanisms of different types of videos, and the location of the consumer and the plot design based on visual cues are very important in influencing consumer engagement behavior [16]. Boscolo, J. C. et al. analyzed gender differences in users’ visual attention and attitudes towards different types of advertisements and showed that men’s visual attention to images will significantly affect their attitudes towards the advertisements compared to women, proving that the visual image of the advertisement will interfere with the user’s response to a certain extent [17]. Jayawardena, N. S. et al. explored adolescents’ visual comprehension of memory for 360- degree video advertisements and found that females are better able to comprehend colors in visual images while males are better able to comprehend facial expressions in visual images, providing a theoretical basis for ad publishers to design effective advertisements to appeal to consumers’ visual memory [18]. Kusumasondjaja, S. emphasized the strategic importance of a brand’s aesthetic image and presentation for Instagram ad content, and that the adoption of expressive aesthetics in ad content that is in an audiovisual format, as well as the adoption of classical aesthetics in ad content that is in a purely visual format, will result in a more interactive consumer response [19].

The proactive nature is that infomercials will provide users with content that matches their interests and preferences, and accurately push ad content that incorporates product or service information when it brings positive consumer response. Kim, T. et al. investigated the effect of disclosure of advertisement streams on advertisement effectiveness and found that advertisements disclosing unacceptable information to consumers will decrease the effectiveness of advertisements, but revealing acceptable streams of information for platforms trusted by consumers will increase the effectiveness of advertisements [20]. Lee, D. et al. described the impact of social media advertising and marketing content on customer engagement; in general, direct message content on its own decreases consumer engagement, but when combined with attributes such as brand personality, it increases engagement, and in particular, direct message content such as discount promotions promotes consumers to complete conversions [21]. Mayrhofer, M. et al. called on policymakers to adopt advertising disclosures in user-generated content, showing that user-generated content not only does not trigger persuasive knowledge, but also leads to other users’ willingness to buy, and that users’ heightened attentiveness to this form of advertising reduces their negative effect on the advertisements, making it a form of covert advertising content [22]. Gavilanes, J. M. et al. argue that different social network advertisements are prerequisites for influencing digital consumer engagement by constructing a model that validates significant differences in content categories in influencing consumer engagement and confirms the intermediary effect of consumer response in the advertising and marketing process based on successful content case studies [23]. Tran, T. P. developed a comprehensive model capable of capturing users’ attitudinal and behavioral responses to personalized advertisements on Internet platforms and further segmented users into three markets, namely, ad lovers, ad panderers, and ad aversives, which can help ad publishers in designing and managing ad content [24].

In this paper, we collect Facebook infomercial images as a research dataset, and use regression neural network model to explore whether the three visual types, i.e., element color, shape, brightness, etc., element position, size, number, etc., and style selection and design, etc., affect the advertising effect by influencing the social sense of presence as well as the social facilitation effect. Then, a collaborative attention model that can simulate the attention mechanism of the human eye is constructed by combining the text features and visual features in the infomercial images, and the user’s visual attention is weighted by the collaborative attention mechanism to calculate the user’s visual attention, so as to judge the user’s response to the features of the infomercial.

2

Advertising visual effect prediction model

The definition of infomercial as a form of advertising was first clarified in international advertising conferences and gradually deepened in subsequent academic research and industry practice. The “Infomercial Handbook” issued by the Interactive Advertising Bureau (IAB) of the United States has defined it authoritatively, emphasizing its natural integration with the media page, the fit of the ad design to the characteristics of the media, and the high degree of consistency with the platform’s user behavior. This form of advertising originated from the social software Facebook and was later introduced by platforms such as Phoenix.com, and rapidly spread in the fields of search engines, short videos and information media, showing diversified display forms. Influence advertising can realize the precise positioning of target audience through the fine analysis of user behavior. Then, innovative ad formats are used to skillfully integrate the content into the information streams that users are browsing or have subscribed to, realizing the efficient placement and precise reach of ads. This comprehensive research perspective not only enriches the theoretical connotation of infomercials, but also provides more scientific guidance for practical application.

2.1

Generalized regression neural network

The algorithmic theory of generalized regression neural networks is based on nonlinear regression analysis, if the joint probability density of x, y two random variables is assumed to be f(x, y), and the observations of x are known to be x₀, then the regression of y with respect to the random variable x can be expressed as:

$E (y | x_{0}) = (x_{0}) = \frac{\int_{- \infty}^{0} y f (x_{0}, y) d y}{\int_{- \infty}^{0} f (x_{0}, y) d y}$ E(y|{x_0}) = ({x_0}) = {{\int_{ - \infty }^0 y f({x_0},y)dy} \over {\int_{ - \infty }^0 f ({x_0},y)dy}} where y(x₀) is the predicted value y if the input is x₀. If the Parzen nonparametric estimation method is applied, it is possible to estimate the density function f(x₀, y) using the sample data set ${x_{i}, y_{i}}_{i = 1}^{n}$ \{ {x_i},{y_i}\} _{i = 1}^n in the form of (1): 1 $f (x_{0}, y) = \frac{1}{n {(2 π)}^{\frac{p + 1}{2} σ^{p + 1}}} \sum_{i = 1}^{n} e^{- d (x_{0}, x_{i})} e^{- d (x_{0}, x_{i})}$ f({x_0},y) = {1 \over {n{{(2\pi )}^{{{p + 1} \over 2}{\sigma ^{p + 1}}}}}}\sum\limits_{i = 1}^n {{e^{ - d({x_0},{x_i})}}} {e^{ - d({x_0},{x_i})}}

Where: $d (x_{0}, x_{i}) = \sum^{p} {[(x_{0 j} - x_{i j}) / σ]}^{2}, d (y, y_{i}) = {[y - y_{i}]}^{2}$ d({x_0},{x_i}) = \mathop \sum \limits^p {[({x_{0j}} - {x_{ij}})/\sigma ]^2},d(y,{y_i}) = {[y - {y_i}]^2}. where n is the sample size, p is the dimension of the random variable x, and σ is the smoothing factor, i.e., the standard deviation of the Gaussian function, which is obtained by substitution: 2 $y (x_{0}) = \frac{\sum_{i = 1}^{n} (e^{- d (x_{0}, x_{i})} \int_{- \infty}^{+ \infty} y e^{- d (y_{0}, y_{i})} d y)}{\sum_{i = 1}^{n} (e^{- d (x_{0}, x_{i})} \int_{- \infty}^{+ \infty} e^{- d (y_{0}, y_{i})} d y)}$ y({x_0}) = {{\sum\limits_{i = 1}^n {({e^{ - d({x_0},{x_i})}}\int_{ - \infty }^{ + \infty } y {e^{ - d({y_0},{y_i})}}dy)} } \over {\sum\limits_{i = 1}^n {({e^{ - d({x_0},{x_i})}}\int_{ - \infty }^{ + \infty } {{e^{ - d({y_0},{y_i})}}} dy)} }}

It is known that $\int_{- \infty}^{+ \infty} x e^{- x^{2}} d x = 0$ \int_{ - \infty }^{ + \infty } x {e^{ - {x^2}}}dx = 0, from which the simplification follows: 3 $y (x_{0}) = \frac{\sum_{i = 1}^{n} y e^{- d (y_{0}, y_{i})}}{\sum_{i = 1}^{n} e^{- d (x_{0}, x_{i})}}$ y({x_0}) = {{\sum\limits_{i = 1}^n y {e^{ - d({y_0},{y_i})}}} \over {\sum\limits_{i = 1}^n {{e^{ - d({x_0},{x_i})}}} }}

This is the predicted value of y if the input is x₀.

From the predicted value formula, it can be seen that the numerator is the weighted sum of y_i obtained from all samples, where the weight value is e^{–d(x₀,x_t)}. Another point of interest is the value of the smoothing factor σ, which has a great impact on the performance of the network itself because generalized regression neural networks do not need to be trained, and in practice there is no need to change the value of the connection weights of the neurons in the network, and it is only necessary to search for the optimal value of the smoothing factor in order to find the optimal model. In general, if the value of σ is very large, then d(x₀, x_i) tends to 0, and the predicted value y(x₀) is approximately equal to the mean of all sample explanatory variables. If the value of σ tends to 0, then the predicted value y(x₀) will be very close to the sample value, but if non-sample values are entered, the predicted value y(x₀) will be much different, which is known as the overlearning phenomenon.

2.2

Determination of the smoothing factor

From the above, it can be seen that the generalized regression neural network does not need to be trained, but only needs to be adjusted by adjusting the value of the smooth factor σ to achieve changes in the connection weights between neurons in the network and thus optimize the network. Smooth factor increment △σ in the range of (σ_min, σ_max) sequential incremental changes in the learning samples to remove a few groups of samples, to be determined after the smooth factor to use the removal of these groups of samples for prediction, and then derive the predicted value of the group of samples and the sequence of error between the samples and the computed error columns mean square deviation: 4 $E = \frac{1}{n} \sum_{i = 1}^{n} [^{{\hat{y}}_{i} (x_{i}) - y_{i}] 2}$ E = {1 \over n}\mathop \sum \limits_{i = 1}^n {[{{\hat y}_i}({x_i}) - {y_i}]^2}

When generalized regression neural network performance evaluation metrics, the smaller the value of the mean square error E, the better the performance of the network. In practical applications, the root mean square error (RMSE) is also often used as an evaluation metric for the network, i.e: 5 $R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {[{\hat{y}}_{i} (x_{i}) - y_{i}]}^{2}}$ RMSE = \sqrt {{1 \over n}\sum\limits_{i = 1}^n {{{\left[ {{{\hat y}_i}({x_i}) - {y_i}} \right]}^2}} }

2.3

Structure of the generalized regression neural network

Generalized regression neural networks are used as a special form of radial basis neural networks and therefore have some similarities with them in terms of neurons. The input vector of the network is [x_n.1, x_n.2, …, x_n.M], the center vector is [c_j.1, c_j.2, …, c_j.M], ||dist|| denotes the Euclidean distance between the input vector and the center vector, b is the threshold, and f is its transfer function, which is usually a Gaussian function, i.e., a normal distribution function.

The network consists of four layers, namely, input layer, pattern layer, summation layer and output layer, where the input layer is X = [x₁, x₂, …, x_n]^T and the corresponding output layer is Y = [y₁, y₂, …, y_k]^T. The number of neurons in the input layer and the number of neurons in the pattern layer are equal to the input sample vectors.

The input layer is the sample input layer, the dimension of the input vector in the learning sample is the number of neurons in the network, and each neuron is a simple distribution unit, and the transfer function is a simple linear function, which transfers the input variables to the second layer of the network, i.e., the pattern layer.

The neurons in the pattern layer are radial basis neurons, the number of which is equal to the learning samples, and each radial basis neuron corresponds to a different sample. The transfer function of the radial basis neuron is: 6 $p_{i} = \exp [- \frac{{(x - x_{i})}^{T} (x - x_{i})}{2 σ^{2}}] i = 1, 2, \dots, n$ {p_i} = \exp \left[ { - {{{{(x - {x_i})}^T}(x - {x_i})} \over {2{\sigma ^2}}}} \right]i = 1,2, \ldots ,n where x is the input variable of the generalized regression neural network and x_i is the sample corresponding to the ird neuron. Usually, the output of a neuron is equal to the exponent of the square of the Euclidean distance between the input variable and the corresponding sample.

In the third summation layer of the generalized regression neural network, there are two types of neurons to be summed.

The formula for the first type is: 7 $\sum_{i = 1}^{n} \exp [- \frac{{(X - X_{i})}^{T} (X - X_{i})}{2 σ^{2}}]$ \sum\limits_{i = 1}^n {\exp } \left[ { - {{{{(X - {X_i})}^T}(X - {X_i})} \over {2{\sigma ^2}}}} \right]

This type allows arithmetic summation of neuron outputs from all pattern layers. In this type, the connection weight between the pattern layer and the individual neurons is 1 and the transfer function is: 8 $S_{D} = \sum_{i = 1}^{n} P_{i}$ {S_D} = \sum\limits_{i = 1}^n {{P_i}}

The formula for the second type is: 9 $\sum_{i = 1}^{n} Y_{i} \exp [- \frac{{(X - X_{i})}^{T} (X - X_{i})}{2 σ^{2}}]$ \sum\limits_{i = 1}^n {{Y_i}} \exp \left[ { - {{{{(X - {X_i})}^T}(X - {X_i})} \over {2{\sigma ^2}}}} \right]

This type allows for a weighted summation of the neuron outputs from all pattern layers. In this type, the value of the connection weight between the i st neuron in the pattern layer and the j nd molecular summation neuron in the summation layer is the jth element in the ird output sample Y_i with the transfer function: 10 $S_{N j} = \sum_{i = 1}^{n} y_{i j} P_{i} j = 1, 2, ..., k$ {S_{Nj}} = \sum\limits_{i = 1}^n {{y_{ij}}} {P_i}\quad j = 1,2,...,k

The number of neurons in the output layer is equal to the dimension k of the output vector in the learning sample, and each neuron divides the output of the summation layer, and the output of neuron j sums up to the jth element in the corresponding estimation result Ŷ = (X) with the following formula: 11 $y_{j} = \frac{S_{N j}}{S_{D}} j = 1, 2, ..., k$ {y_j} = {{{S_{Nj}}} \over {{S_D}}}\quad j = 1,2,...,k

3

Visual saliency prediction related techniques

3.1

Construction process

Visual saliency prediction, as a key image analysis technique in the field of computer vision, can mimic the human eye attention mechanism to quickly capture the focused and intriguing regions in an image. In this paper, we propose a collaborative attention network of text features and visual features to solve the problem of visual saliency prediction for infomercials.

This subsection outlines how to combine the visual and textual features of advertisement images and weight them through the collaborative attention mechanism. This paper demonstrates that the construction process of the advertisement visual saliency prediction method is shown in Fig. 1.

The CAM mainly contains five modules: input module, visual feature learning module, text feature learning module, collaborative attention module, and click rate prediction module. The first is the input module, which is responsible for inputting the visual and textual information of the target advertisement image. Next, the second module of CAM is the visual feature learning module, whose purpose is to learn to acquire the visual features of the image, which include color, hue, and so on. The third module is the textual feature learning module, in which the textual information is segmented to generate a sequence of phrases, and a low-dimensional word vector representation is generated for each phrase by the pre-trained word embedding model word2vec. Then, the word vectors learned by the pre-trained word embedding model are learned by Bi-LSTM to generate the corresponding text features. Next is the co-attention module, which is mainly responsible for learning the attention weights of text features and visual features in pairs. The module is able to process both textual and visual features, generate corresponding attention weights for each feature, and then combine these weights to determine the effect of each feature on the output. The collaborative attention module enables the model to pay more attention to key visual and textual information, thus improving the performance of the model. Finally, based on the textual and visual features weighted by the collaborative attention mechanism, the prediction module obtains the final prediction results through a multilayer perceptual machine. Through the synergistic work of these modules, the CAM is able to obtain information about the advertisement image from multiple aspects, thus improving the accuracy of the prediction.

3.2

Collaborative Attention Module

The visual features G ∈ R^d × N_g and textual features V ∈ R^d × N_v (i.e., spliced vectors of all phrases) of the advertisement image xi extracted through the two modules are used as inputs to the synergetic attention layer, and then the similarity matrix C ∈ R^{N_d × N_v} is computed to obtain the similarity matrix by the following formula: 12 $C = \tanh (G^{T} W_{b} V)$ C = \tanh ({G^T}{W_b}V) where W_b ∈ R^{d × d} is the corresponding weight matrix.

By obtaining the row and column maxima of the similarity matrix and weighting the original textual features V and visual features G with them, a standard collaborative attention vector can be derived. However, in this paper, we tend to consider the similarity matrix as a feature and compute the textual feature and visual feature attention matrices by the following formula: 13 $H^{v} = \tanh (W_{v} V + (W_{g} G) C)$ {H^v} = \tanh ({W_v}V + ({W_g}G)C) 14 $a^{v} = s o f t \max (w_{h v}^{T} H^{v})$ {a^v} = soft\max (w_{hv}^T{H^v}) 15 $H^{g} = \tanh (W_{g} G + (W_{v} V) C^{T})$ {H^g} = \tanh ({W_g}G + ({W_v}V){C^T}) 16 $a^{g} = s o f t \max (w_{h g}^{T} H^{g})$ {a^g} = soft\max (w_{hg}^T{H^g})

Where W_v, W_g ∈ R^k×d and w_hv, w_hg ∈ R^k are the corresponding weights, a^v ∈ R^N and $a^{g} \in R_{g}^{N}$ {a^g} \in R_g^N are the attentional weights for each textual feature v_i and each image region visual feature g_i, respectively. The previously mentioned similarity matrix C can convert the text feature space to visual feature space, and vice versa, the similarity matrix C^T can convert the visual feature space to text feature space. Based on the above attention weights, the final textual feature and final visual feature attention vectors.

Are obtained by calculating the weighted sum of each textual feature and each visual feature, i.e: 17 $\hat{v} = \sum_{i = 1}^{N_{v}} a_{i}^{v} v_{i}, g = \sum_{j = 1}^{N_{g}} a_{j}^{g} g_{j}$ \widehat v = \sum\limits_{i = 1}^{{N_v}} {a_i^v} {v_i},g = \sum\limits_{j = 1}^{{N_g}} {a_j^g} {g_j}

4

Research on the visual effect and user response of advertisements

4.1

Visual effect prediction for infomercials

4.1.1

Sources of advertising datasets

The experiments in this paper use the dataset Facebook infomercial private image ads real dataset. During the experiment, we collected all the data from November 21 to 23, 2023, containing 100,000 samples involving 56,000 creative images. In this paper, we bring together all the fields of the Facebook infomercial dataset to form an information summary as shown in Table 1. In the image ad dataset of this paper, each sample contains three kinds of information: (1) the basic features such as the advertiser, ad position, ad category, and corresponding creative attributes to which an ad image belongs, where the creative attributes describe the relevant information of the corresponding ad creative graph in terms of format, name, width, height, and URL. (2) The creative diagram corresponding to the advertisement picture. (3) The real click-through rate of the advertisement image. The visual design of an advertisement can be divided into three layers from bottom to top: the visual layer (element colors, shapes, brightness, etc.), the spatial layer (element positions, sizes, quantities, etc.), and the script layer (style selection and design, etc.). Inspired by this, we selected 1000 advertisement images from three dimensions and eight attribute directions, including (calmness, excitement), (retro, fashion), (aesthetic, functional) and (emotional, rational). Our criteria for selecting the sample images of advertisements were based on their most distinctive features for initial screening and categorization, and were only used as a reference for the diversity of the images included in the dataset, and the final truth values of the image labels would be determined by the subjects in subsequent experiments.

Table 1.

Detailed data format for image advertising data sets

Serial number	Attribute field	Sample	Describe
1	AdvertiserID	1046141523	advertiser
2	AdvertiserName	Pinkgirl	advertiser
3	adzoneID	34258714	AD bit id
4	adzoneName	Wireless - traffic package - online shopping - mobile phone focus	Position name
5	maincateID	2	Advertising class id
6	maincateName	Women’s wear	Class name
7	Date	2023-11-25	time
8	Ctr	6	Click-through rate (%)
9	aboardID	842508150003	Creative id
10	adboardFormat	2	Creative format
11	adboardName	Good series	Creative name
12	adboardWidth	642	Creative width
13	adboardHeight	205	advertiser
14	imageUrl	TB1CfzpL5TBuNj SspmSuuDRVXa	advertiser
15	templateID	2000000000	AD bit id

4.1.2

Regression analysis of visual effects

By collecting data on advertisement placement and using neural network regression analysis method as a guide, we analyzed the visual elements of advertisements, so as to explore whether the three visual types predict advertisement effects. We believe that the visual elements of advertisements predict the social proximity as well as the social facilitation effect by predicting the advertisement effect is the mechanism of action in this study.

The descriptive statistics analysis is shown in Table 2, among the three types of visual types, the use of element color, shape, brightness, etc. is on average the most, with a mean value of 5.12, a maximum value of 9, and a minimum value of 1. The use of element position, size, and quantity is on average the second most, with a mean value of 3.85, a maximum value of 8, and a minimum value of 1. The least use is the use of style choice and design, etc., with a mean value of 2.74, a maximum value of 6, and a minimum value of 1. Maximum value is 6 and minimum value is 1. In terms of variance, element color, shape, brightness etc. has the highest variance of 3.15 and style choice and design etc. has the lowest variance of 1.32.

Table 2.

Description statistics of variables

Variable class	Variable index	Maximum value	Minimum value	Mean	Variance
Visual layer	Element color, shape, brightness	9	1	5.12	3.15
Space layer	Element position, size, quantity	8	1	3.85	2.06
Script layer	Style selection and design	6	1	2.74	1.32
Social feeling	Thumb up	1094	1	63.56	14012.45
Social facilitation effect	Comment number	125	1	6.84	186.63
Social facilitation effect	Forwarding number	258	2	9.46	552.32
Advertising effect	Click quantity	9445	5	456.12	1060512

Overall, the three types of visual symbols have their own characteristics, with Style Choice and Design having the lowest variance. Element Color, Shape, Brightness, etc. have the largest mean value and are used the most on average, and Element Color, Shape, Brightness, etc. have the largest variance, and advertisers consider the most obvious differences in the use of Element Color, Shape, Brightness, etc. when designing advertisements. From the statistics of the four indicators of advertising effectiveness, the maximum value of the number of comments is 125, the minimum value is 1, and the average value is 6.84. From the point of view of the variance, the value of 186.63 is the smallest relative to the other three indicators, which indicates that the similarity of the user’s commenting behavior on the ads is relatively higher than the other three behaviors. From the point of view of variance, the value of 1060512 is the largest relative to the other three indicators, indicating that the user clicking behavior of users on advertisements has the largest degree of difference.

The regression results, as shown in Table 3, show that style selection and design in advertisements significantly predicted social presence (standardized coefficient = 0.772, P = 0.001), and element color, shape, brightness, etc. in advertisements significantly predicted social presence (standardized coefficient = 0.445, P = 0.001). Element position, size, number, etc. in advertisements significantly predicted social presence (standardized coefficient = 0.703, P = 0.001). The greater the number of elements, style choices and designs, etc. in an advertisement, the more pronounced the sense of social presence. Element color, shape, brightness, etc., element position, size, number, etc., and the number of style choices and designs positively predicted the social facilitation effect, and the social facilitation effect as a whole positively predicted the sense of social presence. The social boost effect as a whole predicts advertising effect significantly, respectively (standardized coefficient = 0.617, P = 0.001), (standardized coefficient = 0.847, P = 0.001), and social proximity positively predicts advertising effect (standardized coefficient = 0.835, P = 0.001).

Table 3.

Regression

Regression project	Normalization factor	T value	P value
Style selection and design→Thumb up	0.772	14.645	0.001
Element color, shape, brightness→Thumb up	0.445	5.926	0.001
Element position, size, quantity→Thumb up	0.703	11.847	0.001
Style selection and design→Comment	0.493	6.825	0.001
Element color, shape, brightness→Comment	0.173	2.036	0.001
Element position, size, quantity→Comment	0.443	6.124	0.001
Style selection and design→Sharing	0.645	9.847	0.001
Element color, shape, brightness→Sharing	0.362	4.678	0.001
Element position, size, quantity→Sharing	0.589	8.925	0.001
Comment→Thumb up	0.617	9.314	0.001
Sharing→Thumb up	0.847	18.422	0.001
Comment→Click	0.674	10.725	0.001
Sharing→Click	0.746	13.428	0.001
Thumb up→Click	0.835	18.706	0.001

4.2

User Visual Attention Response Study

4.2.1

Visual attention calculation

Natural images contain a large number of pattern elements, but some related studies have shown that in advertising images, although the pattern is the first element, text and brand elements also have a large proportion of space. Therefore, for the visual attention algorithm trained on natural images has strong ability to extract pattern information, but is not sensitive to text and brand information, and the performance of visual attention prediction will be reduced when applied to advertising images.

Among the text elements and brand elements in advertising images, the text elements have more obvious characteristics, however, the brand logo elements tend to have stronger design and innovation, and do not have more obvious and uniform characteristics. However, brand elements are often composed of patterns and text information, so the algorithm in this paper adds text element features to improve the performance of the algorithm, and does not extract brand logo features separately.

Humans have a strong center bias when viewing natural images. The advertisement image visual attention data set ADD1000 has proved through research that the human eye gaze for advertisement images also has obvious center bias. Visual attention for advertising images is shown in Figure 2, where the visual attention value in the center region of the image is significantly higher than that in the peripheral locations. In machine vision, this image center is called the extended focus or perspective projection center. Whether in traditional visual attention algorithms or depth-based visual attention algorithms, the center prior is often added as a high-level prior feature to some models.

4.2.2

Trends in average SIF distribution of visual attention

The 1,000 advertising images were randomly divided into eight groups of 125 images each. Each observer was asked to participate in four observation experiments, each separated by at least three days, with two sample groups per observation and a break of at least 30 minutes outdoors between each group. Observers were asked to view the images with their primary eye, which was determined by visual attention to the advertising images. In this section, the correlations between visual attention, eye-movement characteristics, personality traits, and affective preferences (i.e., subjects’ perceptions of five advertising attributes: ad liking, emotionality, functionality, aesthetics, and ad brand liking) will be preliminarily analyzed and discussed in the context of the ad images themselves.

The average SIF distribution trends of the five ad attributes on the 7-point scale are shown in Figure 3. It is clear that the SIF values for the “extremely disliked” ads are unusually high (corresponding to a brand liking score of 1), and an examination of these ad samples reveals that most of them belong to non-mainstream brands or contain obscure content scenes. This coincides with our common sense of understanding that unfamiliar visual content naturally requires more visual attention. In addition, advertised brands generally consist of words, numbers and special symbols, which have a special attraction to the human eye, and are more likely to trigger curiosity and thus increase visual attention output, especially when familiarity is low. And this phenomenon also reverses to show that advertisements with high brand awareness are more likely to gain favorability. Whereas ad favoritism, brand favoritism and affective subjective ratings presented the smallest SIF values for their respective attributes for colleagues with a score of 7, we reasoned that in subjective attribute perception, the more beautiful or preferred visual stimulus corresponded to less average visual attention, which implied that the more subjective attributes of the target were more likely to attract visual attention or be perceived by the human eye more quickly. The SIF trends were relatively concentrated and similar in the middle score ranges of each attribute dimension, while the more they tended toward the ends of the distribution the greater the difference. This suggests that stimuli with extreme attribute representations induce unstable and heterogeneous visual attention, whereas visual stimuli with moderate and mild attribute-specific representations are accompanied by stable visual attention.

The PLCC heat map of the correlation coefficients between the SIF distribution of each advertising attribute and the sequence of 1-7 scores is shown in Figure 4. It can be seen that the three more subjective dimensions of ad liking, ad aesthetics and ad brand liking all have strong negative correlations with SIF, with correlation coefficients corresponding to -0.68, -0.86 and -0.84, respectively, which suggests that the more likeable or beautiful the visual stimulus is perceived (the higher the subjective score is), the lower the average visual attentional intensity given by the human eye is (the lower the SIF value is), consistent with the above findings.), which is consistent with the above findings. We further looked at samples that scored higher on these three attribute dimensions and found that their average duration of a single gaze point was shorter, simply verifying the above claim that visual perception of beautiful things is faster. In addition, the sequence of scores 1-7 showed a weak correlation with the SIF distribution of emotionality of advertisements, but an extremely strong positive correlation with functionality (PLCC = 0.93). This suggests that stronger functional attributes correspond to larger SIF values.

5

Conclusion

In this paper, a regression neural network model is utilized to explore the prediction mechanism of the visual elements of infomercials on the advertising effect. A collaborative attention model is also utilized to compute the distribution and trend of user attention by combining the visual and textual features of advertisements. This paper is based on Facebook’s private image ad dataset for experimentation and analysis. The experimental results show that element color, shape, brightness, etc., element position, size, number, etc., and the number of style choices and designs can positively predict the social booster effect and social proximity, and the social booster effect as a whole and the positive prediction of social proximity advertising effect is significant. The correlation coefficients of ad favoritism, ad aesthetics and ad brand favoritism with SIF are less than 0, with significant negative correlation, which indicates that ad favoritism and so on can attract user’s attention more easily and quickly. Therefore, we should make reasonable use of colors, elements, brightness, etc. to form advertisement design with aesthetics, attractiveness, etc. to improve the advertisement effect.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

A study on the visual effect and user response of infomercials based on neural network analysis

Xia Yan

Anita Binti Rosli

Aryaty Binti Alwie

Published Online: Sep 23, 2025

Received: Jan 16, 2025

Accepted: Apr 23, 2025

DOI: https://doi.org/10.2478/amns-2025-0961

KeywordsRegression neural network, Synergetic attention, Visual effects, Infomercials

© 2025 Xia Yan et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Regression neural network, Synergetic attention, Visual effects, Infomercials