3D Mathematical Modelling Technology in Visual Rehearsal System of Sports Dance

Video information has been widely used in many fields with the development of computer vision technology and video image processing technology. Facing the vast sea of high-dynamic dance video information, users hope to rehearse and query-specific action segments in high-dynamic dance videos quickly and efficiently as querying text information. Finally, the specific action segment of interest is obtained and played and browsed. However, on one hand, due to the gradual quantification of the current video database capacity and the increasingly complex video data structure, the workload of rehearsing specific action segments in high-dynamic dance videos is quite huge [1]. On the other hand, due to the limitations of current hardware equipment, it is not easy to rehearse specific action segments in high-dynamic dance videos quickly and accurately. Therefore, we can only use the inherent video information in the high-dynamic dance video data to rehearse specific action segments for the time being. To realise the analysis and management of high-dynamic dance videos, we need to study the design method of the specific action segment rehearsal system in the video. The traditional embedded high-dynamic dance video-specific action segment rehearsal method mainly combines the embedded platform and similarity matching technology organically. Only in this way can we achieve the extraction and rehearsal of specific action segments in high-dynamic dance videos. More and more experts have carried out related research to improve the accuracy and recall rate of specific action segment rehearsals in high-dynamic dance videos. Many classic methods have been proposed for this purpose.

Some scholars have proposed designing a rehearsal system for specific action segments of high-dynamic dance videos based on semantic concepts [2]. Then they used specific action segment shot segmentation and specific action segment key frame to hierarchically segment the specific action segment and extract the low-level features of the influential action segment. Finally, they used the support vector machine to detect specific action segments’ concepts and rehearse the specific action segments for the conceptual content. However, the video browsing efficiency of this method is low. Some scholars have designed a high-dynamic dance video-specific action segment rehearsal system based on the classification of moving targets. The system mainly includes two modules of specific action segment analysis and rehearsal. Scholars first use the specific action segment analysis module to classify and label specific action segments to form a target segment category. Then they use the specific action segment rehearsal module to quickly rehearse to the corresponding specific action segment through the input target segment category. This method has the problem of low accuracy of target segment classification. Some scholars have designed a high-dynamic dance video-specific action segment rehearsal system based on lens content [3]. The system uses the adjacent-scale wavelet transform to take advantage of the shot detection method to detect specific action segments in high-dynamic dance videos. Then they used a multi-feature adaptive threshold detection method to extract specific action segments and realised the rehearsal of specific action segments based on the extraction results. However, this method has the problems of a large amount of calculation and low precision rate and recall rate.

Aiming at the problems caused by the above methods, this paper proposes a design method of a rehearsal system for high-dynamic dance video-specific action segments based on similarity calculation. The experimental results prove that the proposed method can improve the precision and recall of rehearsal.

Design of a rehearsal system for specific action segments in high-dynamic dance videos

2.1

Feature detection of specific action segments in high-dynamic dance videos

We initialise and update the background of the specific action segment in the high-dynamic dance video to obtain the binarised foreground image of the specific action segment to complete the feature detection [4]. The specific description is as follows.

2.1.1

Initialisation of the background image of a specific action segment in a high-dynamic dance video

We convert high-dynamic dance video frames into grey-scale images and perform block processing. The size of each block is represented as 16 × 16 p-blocks. Suppose the corresponding block change of two consecutive frames of a specific action segment in the high-dynamic dance video is less than 5%, in that case, it is determined that the specific action segment block has not changed. If there is no change for 10 consecutive frames, the specific action segment block data is filled into the corresponding part of the background [5]. Thus, we obtain the initial background image BG0 of the specific action segment of the high-dynamic dance video.

2.1.2

The background image update of the specific action segment of the high-dynamic dance video

We build a high-dynamic dance video frame of size K.; the storage frame is denoted as I₁, I₂, … I_k, and the specific action fragment frames in the frame buffer pool are sampled and updated at the interval of N frames. First, we sort the background BG0 of the specific action segment of the current high-dynamic dance video and the pixels (i, j) of the K-frame specific action segment in the frame buffer pool of the high-dynamic dance video according to the grey-scale intensity [6]. Then, we select the median value to update the corresponding pixels of the specific action segment in the background image of the high-dynamic dance video.

2.1.3

Obtain the foreground image of a specific action segment of a high-dynamic dance video

In the current high-dynamic dance video image I, its R, G and B-specific action segment components are denoted as I_r, I_g, I_b, respectively. The foreground images D_r, D_g and D_b of the specific action segment of the high-dynamic dance video are calculated as |I_r − BG₀|, |I_g − BG₀|, |I_b − BG₀|, respectively, and the background difference method is used for binarisation as: (1) $f_{x} (i, j) = {\begin{array}{l} 1, D_{x} (i, f) > {thresh}_{l} (i, j) & \\ D_{x} (i, f) < {thresh}_{&} (i, j), x = [r, g, b] \\ 0, otherwise \end{array}$ {f_x}(i,j) = \left\{ {\matrix{ {1,{D_x}(i,f) > thres{h_l}(i,j)\ } \hfill \cr {{D_x}(i,f) < thres{h_\ }(i,j),x = [r,g,b]} \hfill \cr {0,otherwise} \hfill \cr } } \right. where x represents the pixel position of a specific action segment in the high-dynamic dance video, D_x represents the brightness value of the position ξ of the specific action segment in the high-dynamic dance video and & represents the weight value added to D_x. The high-dimensional feature vector thresh_l(i, j) and the low-dimensional feature vector thresh_&(i, j) of a specific action segment in a high-dynamic dance video are calculated according to the following formula: (2) ${thresh}_{l} (i, j) = λ (\frac{I_{k + 1}}{2} + l (i, j) - \frac{I_{k + 1}}{2} - l (i, j))$ thres{h_l}(i,j) = \lambda \left( {{{{I_{k + 1}}} \over 2} + l(i,j) - {{{I_{k + 1}}} \over 2} - l(i,j)} \right) (3) ${thresh}_{&} (i, j) = λ (\frac{I_{k + 1}}{2} + h (i, j) - \frac{I_{k + 1}}{2} - h (i, j))$ thres{h_\& }(i,j) = \lambda \left( {{{{I_{k + 1}}} \over 2} + h(i,j) - {{{I_{k + 1}}} \over 2} - h(i,j)} \right) where λ represents the binarisation threshold of a specific action segment in a high-dynamic dance video, l represents the length of the specific action segment image and h represents the height of the specific action segment image [7]. Then the final calculation expression of the foreground image of the specific action segment of the high-dynamic dance video is (4) $f (i, j) = f_{r} (i, j) | f_{g} (i, j) | f_{b} (i, j)$ f(i,j) = {f_r}(i,j)|{f_g}(i,j)|{f_b}(i,j)

2.2

Similarity calculation of specific action segments in high-dynamic dance videos

Based on the feature detection results of a specific action segment in a high-dynamic dance video in Section ‘Feature detection of specific action segments in high-dynamic dance videos’, we first calculate the density distribution, tightness and dispersion of the specific action segment. Then calculate the active block ratio of the specific action segment and the similarity between the specific action segment [8]. Finally, according to the similarity calculation result, the rehearsal of the specific action segment in the high-dynamic dance video is realised. Therefore, the specific description process is as follows.

Assuming that there is a set of pixels of a colour object C_i in a specific action segment in a high-dynamic dance video: (5) $s = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots (x_{n}, y_{n})}$ s = \{ ({x_1},{y_1}),({x_2},{y_2}), \cdots ({x_n},{y_n})\}

the calculation expression for the density distribution of a specific action segment in a high-dynamic dance video is (6) $f_{i 1} = \frac{n}{k}$ {f_{i1}} = {n \over k} where the formula κ represents the total number of pixels of a specific action segment in a high-dynamic dance video.

The calculation formula for the compactness distribution of a specific action segment in a high-dynamic dance video is as follows: (7) $f_{i 2} = \frac{m}{n}$ {f_{i2}} = {m \over n} where m represents the total number of pixels in a specific action segment of a high-dynamic dance video with four connected pixels in the set s.

The formula for calculating the dispersion degree of a specific action segment of a high-dynamic dance video is (8) $f_{i 3} = \frac{1}{n \sqrt{k}} \sum_{j = 1}^{n} \sqrt{{(x_{j} - x_{μ})}^{2} + {(y_{j} - y_{μ})}^{2}}$ {f_{i3}} = {1 \over {n\sqrt k }}\sum\limits_{j = 1}^n \sqrt {{{({x_j} - {x_\mu })}^2} + {{({y_j} - {y_\mu })}^2}} where x_j, x_μ, y_j and y_μ represent the total number of pixels of a specific action segment with pixels in four connections in the high-dynamic dance video. The relational expression is as follows: (9) $x_{μ} = \frac{1}{n} \sum_{j = 1}^{n} x_{j}$ {x_\mu } = {1 \over n}\sum\limits_{j = 1}^n {x_j} (10) $y_{μ} = \frac{1}{n} \sum_{j = 1}^{n} y_{j}$ {y_\mu } = {1 \over n}\sum\limits_{j = 1}^n {y_j}

To define the fourth feature of a specific action segment of a high-dynamic dance video, we divide the specific action segment of a high-dynamic dance video into 16 × 16 p-blocks of the same size. If these divided blocks contain some subsets of s, these blocks are considered to be active [9]. Assuming that the number of functional blocks of a specific action segment in a high-dynamic dance video is q, the calculation expression for the ratio of functional blocks of a high specific action segment is as follows: (11)

f_{i 4} = \frac{q}{p}

{f_{i4}} = {q \over p}

According to the above calculation, the average value of the spatial characteristics of the specific action segment in the high-dynamic dance video is, respectively, taken. We use

{\bar{f}}_{i 1}

{\overline f _{i1}}

{\bar{f}}_{i 2}

{\overline f _{i2}}

{\bar{f}}_{i 3}

{\overline f _{i3}}

, and

{\bar{f}}_{i 4}

{\overline f _{i4}}

, to represent the average feature value of a colour object C_i in a specific action segment of a high-dynamic dance video [10]. The calculation formula for the spatial distribution difference between two colour objects, C_i and C_j, in a specific action segment of a high-dynamic dance video is as follows: (12)

D_{s} (C_{i}, C_{j}) \frac{1}{4} (| {\bar{f}}_{i 1} - {\bar{f}}_{j 1} | + | {\bar{f}}_{i 2} - {\bar{f}}_{j 2} | + | {\bar{f}}_{i 3} - {\bar{f}}_{j 3} | + | {\bar{f}}_{i 4} - {\bar{f}}_{j 4} |)

{D_s}({C_i},{C_j}){1 \over 4}\left( {|{{\overline f }_{i1}} - {{\overline f }_{j1}}| + |{{\overline f }_{i2}} - {{\overline f }_{j2}}| + |{{\overline f }_{i3}} - {{\overline f }_{j3}}| + |{{\overline f }_{i4}} - {{\overline f }_{j4}}|} \right)

According to the above calculation results, we use the Canny edge detection operator to perform edge detection on the 16 × 16 p-blocks of the same size in a specific action segment of a high-dynamic dance video. If the number of edge points of a specific action segment in the block is greater than the set threshold, it is determined that the specific action segment block has texture. Then calculate the ratio of texture blocks for each specific auction segment and the average value in a specific action segment [11]. Finally, the texture similarity in the two specific action segments is calculated from the minimum value of the average value of the two specific action segments.

Assuming that A and B represent all colour objects in a specific action segment S₁ and S₂ a high-dynamic dance video, respectively, we give a specific action segment colour object u ∈ A corresponding to the similarity colour pair υ ∈ ɛ in B that satisfies ||u − υ|| ∈ ɛ. Where ||u − υ|| represents the Euclidean distance between u and ν in the HSV colour space and ɛ represents the threshold of a specific action segment. At this time, (u, υ) represents a similar colour pair in a specific action segment of a high-dynamic dance video.

If Ω = {(u, υ)|(u, υ) ∈ A × B}, the average histograms of specific action segments S₁ and S₂ of a high-dynamic dance video are represented as ${\bar{H}}_{1}$ {\overline H _1} , and ${\bar{H}}_{2}$ {\overline H _2} , respectively, and the calculation expression for the similarity between specific action segments S₁ and S₂ of a high-dynamic dance video is as follows: (13) $sim (S_{1}, S_{2}) = \frac{1}{k} \sum_{(u, υ) \subset Ω}^{n} W D_{s} (u, υ) + min [{\bar{H}}_{1} (u), {\bar{H}}_{2} (υ)] + w_{t} min (t_{1}, t_{2})$ sim({S_1},{S_2}) = {1 \over k}\sum\limits_{(u,\upsilon ) \subset \Omega }^n W{D_s}(u,\upsilon ) + \min [{\overline H _1}(u),{\overline H _2}(\upsilon )] + {w_t}\min ({t_1},{t_2}) t₁ and t₂ represent the average ratio of texture blocks of high-dynamic dance specific action segments S₁ and S₂, respectively, and wt and W represent the weighting function of the specific action segment of the high-dynamic dance. Among them, the weight function W of the specific action segment of the high-dynamic dance is the Sigmoid function. Its expression is: (14) $W (x) = \frac{1}{1 + e^{ax + b}}$ W(x) = {1 \over {1 + {e^{ax + b}}}} In the formula, a and b are two parameters of a specific action segment. Assume that two specific action segments in a high-dynamic dance video are represented as: (15) $G_{1} = {S_{11}, S_{12}, \dots, S_{1 m}}$ {G_1} = \{ {S_{11}},{S_{12}}, \cdots ,{S_{1m}}\} (16) $G_{2} = {S_{21}, S_{22}, \dots, S_{2 n}}$ {G_2} = \{ {S_{21}},{S_{22}}, \cdots ,{S_{2n}}\} Equations (15) and (16), respectively, include m specific action sub-segments and n specific action sub-segments. Then the similarity calculation expression between two specific action segments G₁ and G₂ in the high-dynamic dance video is: (17) $sim (G_{1}, G_{2}) = \frac{1}{m \times n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {sim (S_{1 i}, S_{2 j}) \times e^{| i - j |^{2} / δ^{2}}}$ sim({G_1},{G_2}) = {1 \over {m \times n}}\sum\limits_{i = 1}^m \sum\limits_{j = 1}^n \{ sim({S_{1i}},{S_{2j}}) \times {e^{|i - j{|^2}/{\delta ^2}}}\} sim(S_1i, S_2j) represents the similarity between the i frame and the j frame in the specific action segment (S_1i, S_2j) of the high-dynamic dance video, δ represents the normalised parameter of a specific action segment and $e^{\frac{| i - j |^{2}}{δ^{2}}}$ {e^{{{|i - j{|^2}} \over {{\delta ^2}}}}} represents the similarity weight of the specific action sub-segment that does not correspond to the lower position in the high-dynamic dance video. According to formula (17), the average value of the similarity values of the specific action sub-segments of the high-dynamic dance video is calculated and selected as the similarity of the specific action segment. The rehearsal is completed based on ensuring the sequence order of the specific action segment in the high-dynamic dance video.

2.3

Design framework of specific action segment rehearsal system in high-dynamic dance video

Based on the above calculations, we design the framework of a rehearsal system for specific action segments of high-dynamic dance videos. The system mainly includes a feature detection module for specific action segments and a rehearsal module for specific action segments. The overall framework is shown in Figure 1.

The framework of a specific action segment rehearsal system in a high-dynamic dance video

According to Figure 1, it can be seen that the specific motion segment detection module in the high-dynamic dance video is mainly composed of three parts: the background initialisation of the specific action segment, the background update of the specific action segment and the foreground image acquisition of the specific action segment. The rehearsal module of a specific action segment in a high-dynamic dance video is mainly composed of four parts: calculation of the density distribution, tightness, discrete degree of specific action segments and the similarity between specific action segments.

Motion capture technology is a technology that turns human motion information into a computer that can recognise it. The captured data can transform the human body's movements in the real 3D space into virtual 3D space data in the computer. The motion capture system first captures the motion of the human body and then maps the captured motion to a computer-generated virtual model. The object of motion capture is the motion of the model. We paste particular mark points ‘Marker’ on the joint points of the model. We then use the capture lens of the motion capture device to sample the position of these mark points in space, thereby generating a set of motion data that the computer can recognise. With the motion data, we can enter the 3D animation software. We build 3D human body models in 3D animation software, and each 3D model is a polygonal mesh model.

The model can simulate the dancer's standard body as much as possible, and the facial features also use the main facial features of various ethnic groups. After establishing the human body model, the clothing and apparel model must be established [12]. All kinds of dances and costumes of different ethnic groups are different, so it is necessary to establish multiple sets of costume models distinguished by men and women. After the above model is established, we need to create a 3D skeleton and accurately correspond to the model to prepare for the next skinning step. Skinning is the combination of bones and models. This enables the model and the bones to be correctly bound together. At the same time, the binding of various clothing and apparel should be done, and then the data collected by the motion capture system should be imported. These data are combined with the bones in the 3D animation to drive the 3D model to generate animation. These animations can be easily imported into the virtual presentation platform and rendered into video files for presentation, saving and so on. These prepare for subsequent applications. The display system is mainly implemented by the current mainstream 3D game engine U-NITY 3D, which mainly includes the following modules (Figure 2):

2.3.1

Total credit module

A 3D animation video is like the opening scene of a game or film and television work. Therefore, the system also includes a menu system, including display system name, production unit, module selection and so on.

2.3.2

Dance selection module

The system selects the dance type according to the classification of various dances Which mainly include the following: (1) self-entertainment dance. This adopts an accessible and lively mode without a fixed programme; (2) dances are based on legends spread by various ethnic groups; (3) the collective dance of the masses in festivals and grand gatherings and (4) based on the folk dances gathered in the square during the festival.

2.3.3

Camera control module

Unity3D is a mainstream 3D game engine with powerful control capabilities. When we observe the dance movements of the 3D model, we can control our observation perspective through the mouse and keyboard like playing a game and show our dance works in 360° without dead ends [13]. This module can use the default camera control inside the engine or the programmed by itself. In addition, the system adopts other control modes.

Experimental results and analysis

The experiment uses 500 pieces of high-dynamic dance video data downloaded from the Internet, totalling 120 h. The experimental environment is Xeon E5410 3.2GHz, 4GB memory, and Windows 8 operating system. We compare and analyse the method in this paper with the method. At the same time, we tested the rehearsal requests for specific action segments in 200 high-dynamic dance videos. Finally, we recorded the recall rate of the specific action segment, the precision rate of the specific action segment, and the running time of the specific action segment rehearsal system.

To verify the effectiveness of the method proposed in this article, we first define the precision and recall of specific action segments in high-dynamic dance videos and then use this as a standard to measure the performance of the rehearsal system [14]. Next, the article draws curves of recall and precision of specific action segment rehearsal for three different system design methods. The details are shown in Figures 3 and 4.

Rehearsal ratio comparison of specific action clips in high-dynamic dance videos

Comparison of accuracy of rehearsal of specific action segments in high-dynamic dance videos

Figures 3 and 4 show that specific action segment rehearsal accuracy is always maintained at about 80%. The precision rate and recall rate of the specific action segment rehearsal of the method in this paper show a simultaneous upward trend. Because the method in this paper calculates the density distribution, tightness, dispersion and similarity between specific action segments in high-dynamic dance videos, the precision and recall of the rehearsal system are improved. Thus, the system proposed in this paper has good practical performance.

Conclusion

When we used the current method to rehearse a specific action segment in a high-dynamic dance video, we found that the running time was longer and the rehearsal accuracy rate was low. For this reason, we propose a design method of rehearsal system for high-dynamic dance video-specific action segments based on similarity calculation. Experiments show that the proposed method has improved the precision and recall of the rehearsal system due to the calculation of the density distribution, tightness, dispersion and similarity between specific action segments in high-dynamic dance videos. As a result, the system has good practical performance.

eISSN:: 2444-8656
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Journal RSS Feed

3D Mathematical Modelling Technology in Visual Rehearsal System of Sports Dance

Published Online: Nov 22, 2021

Page range: 113 - 122

Received: Jun 17, 2021

Accepted: Sep 24, 2021

DOI: https://doi.org/10.2478/amns.2021.2.00078

Keywords
High-dynamic dance video, 3D mathematical modelling, visual rehearsal of sports dance, discrete degree calculation, specific actions

© 2021 Chen Chen et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

3D Mathematical Modelling Technology in Visual Rehearsal System of Sports Dance

Published Online: Nov 22, 2021

Page range: 113 - 122

Received: Jun 17, 2021

Accepted: Sep 24, 2021

DOI: https://doi.org/10.2478/amns.2021.2.00078

KeywordsHigh-dynamic dance video, 3D mathematical modelling, visual rehearsal of sports dance, discrete degree calculation, specific actions

© 2021 Chen Chen et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Keywords
High-dynamic dance video, 3D mathematical modelling, visual rehearsal of sports dance, discrete degree calculation, specific actions