Open Access

Research on motion capture of dance training pose based on statistical analysis of mathematical similarity matching


Cite

Introduction

In the field of computer research today, people are increasingly hoping to use computer recognition to determine the external environment, to provide people with more intelligent information and services. Computer vision uses a single or multiple cameras to obtain external environmental information, then inputs the acquired video or image information into a computer, converts it into digital signals, and processes it by the computer to analyse and understand possible data information. Thereby the process of judging the corresponding motion can be divided into three levels of low, medium and high, according to the computer research object. Also included are the edge gradient, texture grey value, contour, location information and deep understanding of the scene. Computer vision classification is shown in Figure 1 [1, 2].

Fig. 1

Schematic diagram of computer vision classification.

Computer vision can enhance the accuracy, effectiveness and robustness of information capture. It has universal application value and research significance in the following areas.

Motion analysis is one of the main research contents of computer vision. Applying motion analysis to dance teaching analyses and discriminates the dance movements of trainees, which helps trainers to find out where the movements are irregular in time, using dance technology. The action parameters are converted into a three-dimensional (3D) motion model, a teaching sample library is established and the analysis results are used as important indicators to measure the education level, which helps to improve the quality of education [3].

The research in this thesis combines motion analysis in computer vision with college dance teaching, researches on the capture and analysis of human motion characteristics, and detects and discriminates samples in motion in video, to achieve the analysis of human motion posture and realise the science behind the purpose of teaching and training. Using the high-precision infrared camera's accurate capture and the computer system's precise processing, more information can be obtained, such as the displacement, angle and speed of the human limb and joint movements. The intelligent method provides realistic and effective learning for dancing theoretical basis.

Human pose analysis

The human motion posture analysis first emerged in the West. With the advent of the new century, domestic research experts in China have also begun to study and analyse the 3D posture. Generally, the data obtained by means of motion capture devices or video are discretely changed. The attitude data parameters are extracted from the discrete data to determine the law of attitude changes. The use of mathematical similarity analysis is one of the main approaches. At this stage, there are several methods of posture analysis [4]:

Attitude analysis based on simulated annealing particle swarm algorithm

The literature mainly studies the posture of the human arm. The video is used to obtain the parameters of the arm's posture. The particle filter algorithm is used as the weight of the sampling points to approximate the condition density. The approximate weights are randomly updated by Monte Carlo. Next, is to predict the random motion of the particles and measure the range of particle motion to extract the posture of the human arm. The literature introduced annealing particle swarm optimisation method into the analysis of human motion posture. First, the captured 3D parameters are reduced by PCA to obtain a low-dimensional compact attitude space. Second, the fitness function is constructed using human silhouette features in low-dimensional space to measure the similarity between the model and image features. Finally, the annealing particle swarm optimisation algorithm is used to improve the analysis efficiency [5] as E(yt,xt)=S×((1β)BtBt+Yt+βRtRt+Yt) E\left( {{y_t},{x_t}} \right) = S \times \left( {\left( {1 - \beta } \right){{{B_t}} \over {{B_t} + {Y_t}}} + \beta {{{R_t}} \over {{R_t} + {Y_t}}}} \right) where Rt is the silhouette area of the human body in the calculated image, Bt is the projected area of the model and Yt is the overlap area of the two.

Model-based motion posture analysis

This method first needs to establish a corresponding representation model of the human body, and then map the model into a 2D image, and express the similarity between the model and the image data by some evaluation function. First, in the model selection and establishment stage, a good model representation can approximate the human body in an image. For example, the human skeleton is represented by a stick figure. In the method of least square curve fitting, the law of human motion is approximated by the obtained polynomial coefficients [6]. First, calculate the function f (t) = a0t + a1 of the discrete coordinate data point xi,yi,zi as a function of time. To determine the approximate trend of the function, expect Φ = span{φ0t,φ1t, ⋯ , φnt} to find the most suitable function f (t) in the coefficient space, that is, s(t)=a0φ0t+a1φ1t++anφnt s(t) = {a_0}{\varphi _0}t + {a_1}{\varphi _1}t + \cdots + {a_n}{\varphi _n}t The closest function s(t) is the best fit to function B f (t). as shown in Figure 2. Geometrically speaking, this method is to find the sequence of given points and find the curve with the smallest sum of squared distances.

Fig. 2

Least squares fitting curve.

Feature-based motion posture analysis

This type of method does not require modelling in advance, and feature selection is the basic work of attitude analysis. The common features are points, lines, faces, blocks, corners and other more complex features. For example, the centroid of the human body is used as a pose feature to represent the orientation of the frame where it is located, and the centroid position of the person in the next frame is estimated by changing the centroid position [7].

Basic theory of motion capture technology
Components of a motion capture system

With the continuous progress and development of computer hardware facilities, motion capture devices are becoming more and more commercialised. Different companies have relatively different configurations of motion capture devices, but in general, relatively classic capture devices often include the following components:

Sensor: In the motion capture system, the sensor is a kind of tracking equipment that is firmly set in the key part of the moving target. The motion capture device can capture the displacement, speed and angle of a moving object through the physical signals sent by the sensor. There are usually different requirements for the number of sensors for different inspection sites of different inspection objects. For example, to capture the movement of the human body, the demand for sensors is usually relatively small. For tracking research such as gesture capture, more sensors are needed [8].

Signal capture device: This kind of device will be different due to different types of motion capture systems. Its main function is to capture the position signal supplied by the sensor. As for the mechanical capture system, it uses a circuit board. To collect electrical signals, but now a relatively extensive optical capture system is used, which uses the higher resolution of infrared cameras to capture all signals.

Data transmission equipment: This equipment transmits all the motion parameters acquired in the motion capture device to the computer system, analyses and processes the relevant parameters in a timely manner through the computer system, and maintains the accuracy, efficiency and accuracy of the transmission process.

Data processing equipment: The motion data captured by motion capture equipment needs to be corrected and processed. After that, it is matched with the 3D model. The processed data makes the 3D model more realistic and natural.

Optical motion capture system

This article studies the optical capture system, which is the first practical optical capture product in China—Dongfangxin DIMS9100. The system uses multiple near-infrared high-sensitivity cameras, and uses the ‘S/D three-dimensional space calibration algorithm’ developed independently to improve the real-time 3D data. The main hardware equipment of the system includes a high-speed 3D data capture workstation, a DIMS controller, a dedicated near-infrared high-sensitivity camera, a high-sensitivity marking point, a high-precision 3D static and a dynamic calibration rod, as shown in Figure 3. The main operation flow of the system is shown in Figure 4 [9].

Fig. 3

Optical capture device hardware facilities.

Fig. 4

Flow chart of the optical operating system.

Analysis of dance pose based on feature vector matching
Acquisition of 3D data for motion capture

In this paper, an optical motion capture system is used to achieve the acquisition of motion data, to establish a database of human motion pose models and skeleton models. The basic flow is as shown in Figure 5.

Fig. 5

3D data acquisition flowchart.

Human motion pose analysis based on feature vector matching

Motion posture analysis is the process of tracking, capturing, acquiring and analysing the features of the human body to obtain the relevant parameters. Through the effective combination of motion analysis and teaching, the teaching system can be more personalised and special. It can also decompose the performance of the performers in detail, and gradually demonstrate each dance movement. The parameters obtained are helpful for quantitative analysis of the movement posture, providing good help for more scientific and intelligent dance teaching [10].

In order to better analyse the movement status of dance performers, a method of analysing human motion poses by using the principle of feature plane similarity matching is proposed. This method simplifies the traditional calculation of Euclidean distance based on multiple identification points to base on calculation of the feature plane feature vector and its included angle. In this paper, the identification points of 21 key parts are simplified into seven feature planes to calculate the difference and correlation of motion. After verification, this method can quickly and effectively analyse human movement posture and apply it to dance teaching to improve the efficiency of dance teaching. The specific process is as shown in Figure 6, where the main steps of the analysis process include as follows:

Step 1. Real-time acquisition of skeleton data: The optical motion capture method is used to acquire the dance action sequence in real-time, and the coordinates of each identification point of the human model in the space coordinate system are stored.

Step 2. Posture analysis: Determine seven feature planes based on the feature points, extract the angle between the feature vector and the pose feature vector, and calculate the feature correlation coefficient of the human pose based on the movement characteristics of the key parts of the dance action.

Step 3. Analysis of feature pose difference: Through the correlation coefficients of feature vectors and their included angles, analyse the differences and accuracy of student dance movements and standard movements.

Fig. 6

Method flow chart.

Traditional 3D model similarity matching

The similarity matching of human poses is a measure of the pose difference or similarity between different human bodies. The most used method is the traditional Euclidean distance measurement.

The traditional 3D model similarity matching is based on Euclidean distance, and its calculation methods are as follows: D=sqrt((x1x2)2+(y1y2)2) D = sqrt\left( {{{\left( {{x_1} - {x_2}} \right)}^2} + {{({y_1} - {y_2})}^2}} \right) D=sqrt((x1x2)2+(y1y2)2+(z1z2)2) D = sqrt\left( {{{\left( {{x_1} - {x_2}} \right)}^2} + {{({y_1} - {y_2})}^2} + {{({z_1} - {z_2})}^2}} \right) where X1 = (X11,X12,X13,⋯,X1n), X1 = (X21,X22,X23,⋯,X2n) is n-dimensional data. Every two marking points can be calculated by the formula based on the Euclidean distance. If the difference is less than the threshold, which is set by the coach, the two marking points are considered to be similar. If the difference is greater than this threshold, the two identification points are considered to be dissimilar. Comparison of the track of the same marked point between the standard action and the action are measured, as shown in Figure 7 [11].

Fig. 7

Comparison of motion trajectories in a single direction between a single identification point.

The direct comparison method based on the traditional Euclidean distance is to compare the moving trajectories of two moving targets. During the comparison of each action sequence, the corresponding distance difference is obtained, and the degree of data matching is calculated according to a preset threshold. However, this method not only has a large amount of calculation, but also depends on the inherent characteristics of the test object. When the object's body proportion, such as height, weight and weight changes, the distance between the marking points also changes accordingly and must be repeated in the settings and calculations. Therefore, the strict requirements for moving objects limit the use of traditional methods, which not only greatly reduces the computational efficiency, but also lacks universality [12].

Similarity calculation based on feature plane matching

This article uses a motion capture system to extract a human skeleton model. First, a feature plane can be determined based on three feature points, and 21 identification points on the human motion posture skeleton are converted into 7 feature planes as basic calculation planes. Each feature plane represents various parts of the human body, as shown in Figure 8. Seven feature plane normal vectors (V1V7) are extracted on the feature plane to determine the difference in the overall direction of the motion pose. Second, the local normality of the motion pose plane is determined by the angle (θ1θ7) of the feature plane edge vector. In addition, the feature plane edge vector and the trunk included angle (θ8θ12) in the vertical direction (Vs tan d) considers the local relationship between the limb and the torso. In this way, a two-tuple 〈V, θ 〉 is used as the input of the model similarity calculation, and the correlation parameter of 〈V, θ〉 is used as the output. This method effectively solves the calculation errors caused by the inherent characteristics of the object to be measured, and reduces the calculation complexity, thereby improving the efficiency and stability of the human posture analysis [13].

Fig. 8

Plan view of human bone features.

According to the normative requirements of dance movements and the relative range of movement of the human skeleton, the feature vectors of the main movement parts of the human body and their included angles are specified as shown in Tables 1 and 2:

The movement direction of the limbs and limbs can be determined by the inner product of the limb feature plane Pm (m = 1,2,3,4), the normal vector Vm (m = 1,2,3,4) and the vertical vector Vs tan d of the trunk to determine the direction of movement, and the angle θm (m = 1,2,3,4) to accurately determine the regularity of limb movement.

Compare the normal vector V5 of the head characteristic plane P5 with the vertical standing direction Vs tan d to determine the included angle θ5 of the head of the motion model. When the human body looks straight ahead, V5 and Vs tan d are parallel [14].

The torso includes the chest and hips. It is mainly a turning movement and a bending movement. Among them, the turning movement compares the spine direction vector V6 with the vertical standing direction Vs tan d and transforms the angle θ6. For the bending movement, when the human body stands upright, the hip plane P5 remains horizontal, and its plane normal vector V7 is parallel to the vertical direction Vs tan d.

Discriminant vectors on the feature plane.

Feature plane Feature vector

Eigenvector left arm (P1) V1 = VLLarm × VLFarm
Right arm (P2) V2 = VRLarm × VRFarm
Left leg (P3) V3 = VLThigh × VLCrus
Right leg (P4) V4 = VRThigh × VRCrus
Head (P5) V5 = VLHead × VRHead
Chest (P6) V6 = VLChest × VRChest
Hip (P7) V7 = VLHip × VRHip

Determining the included angle on the characteristic plane.

Feature plane Geometric relationship θMax θMin

Left elbow (P1) θ1 = <VLFarm, VLLarm> 180° 40°
Right elbow (P2) θ2 = <VRFarm, VRLarm> 180° 40°
Left knee (P3) θ3 = <VLThigh, VLCrus> 180° 35°
Right knee (P4) θ4 = <VRThigh, VRCrus> 180° 35°
Head (P5) θ5 = <VRHead, VLhead > - -
Chest (P6) θ6 = <VRChest, VLChest > - -
Hip (P7) θ7 = <VRHip, VLHip> - -

The angle between the feature plane edge vector and the vertical direction is used to determine the relationship between the limbs and the trunk joints, and the connection angle is determined as shown in Table 3:

The angle between the joints.

Geometric relationship θMax θMin

Left shoulder θ 8 = <VLFarm, VStand> 180°
Right shoulder θ 9 = <VRFarm, VStand> 180°
Left hip θ 10 = <VLThigh, VStand> 45°
Right hip θ 11 = <VRThigh, VStand> 45°
head θ 12 = <V5, VStand> 45°

In this paper, the cosine similarity method is used as the similarity function. This method can not only measure the degree of difference between vectors, but also measure the similarity and difference between angles. According to the angle cosine of the product of the two vectors in space, the difference between the feature vectors can be measured. The calculation method is as follows: θi,j=arccos(similarity(Vi,Vj)) {\theta _{\left\langle {i,j} \right\rangle }} = \arccos \,\left( {similarity\left( {{V_i},{V_j}} \right)} \right) The effective movement range θi,j between the feature vectors and the value range [θmin,θmax] are used to judge the standard and formativeness of the dance movement amplitude. If θi,j < θmin or θi,j > θmax, then the corresponding limb movement is wrong and the next operation cannot be performed. If θminθi,jθmax, then the corresponding limb movement is within the discriminable range, that is, the next operation is performed.

Because there are individual differences between human bodies, such as different problems such as height, weight and arm length, the proportion of human bodies is constant. Therefore, it is necessary to measure whether the limb's motion amplitude meets the standard through the similarity of the angles. The calculation method is as follows: Corr(θi,j)=1(arccos(similarity(Vi,Vj))π) Corr\left( {{\theta _{\left\langle {i,j} \right\rangle }}} \right) = 1 - \left( {{{\arccos \,\left( {similarity\,\left( {{V_i},{V_j}} \right)} \right)} \over \pi }} \right) Taking the movement of the left arm of the dance trainer as an example, and taking the characteristic plane P1 of the left arm as the basic calculation plane, three discriminative parameters {Sim(V1,Vs tan d),Corr (θ1),Corr (θ8)} can be obtained, and the overall posture of the left arm can be determined based on the above three parameters. Experiments show that the calculation error is effectively reduced. The results are shown in Table 4 [15].

Correlation parameters of left arm gesture (0–1 s).

Test subject Sim (V1, VStand) Corr (01) Corr (08)

Standard object 0.6943 0.9729 1
Test object 0.7025 0.8299 0.9815
Analysis of experimental results

This experiment was completed on a PC with a Core (TM) i5-3470 3.2 GHz CPU and 4GB of memory, and MATLAB was used as the development environment. The created motion database contains 18 sets of dance action clips, and each set of dance actions is about 1200 frames. The experimental subjects are randomly selected college students. The experimental subjects have a basic dance.

The subjects are required to imitate the standard actions of the dance teacher, make corresponding dance moves under the optical motion capture system and extract the kinematic characteristics of the joint points of the left arm of the subjects. Compared with the standard actions, this article mainly uses single-segment dance moves (0–10 s (Inner) The final movement of the left arm is compared experimentally. Taking a real-time captured local motion sequence as an example, as shown in Figure 9, the differences between the main movement changes of the test object and the standard movements are compared [16].

Fig. 9

Movement differences of the skeleton model.

Through the comparison analysis of the degree of difference, it is concluded that the degree of elbow flexion of the test object in the 4–6 s time interval and the left arm swing amplitude in the 6–8 s time interval are significantly different from the standard movements, as seen in a comparison chart of the differences in the left arm's movement posture (Figure 10). From the figure, the difference between the action to be measured and the standard action is clearly seen [17].

Fig. 10

Timing diagram of left arm motion correlation parameters.

Through comparison and verification of the experimental results, the method of feature plane similarity matching can be used to clearly and efficiently detect the differences and standards of moving objects in the analysis of motion poses. It has high robustness and lays the foundation for scientific training in dance basis [18].

Conclusion

The advantages of this research are mainly connected with dance teaching through the optical motion capture system, which improves the intuitiveness of learning effects, real-time data collection and analysis, and provides timely feedback for teaching. At the technical level, the teaching form and student acceptance of innovative and other aspects provide scientific theoretical support, get rid of other interference factors of the traditional teaching mode, provide a reliable basis for the improvement of the teaching mode, and help the system to improve personalised teaching system. The next major research work is to complete the real-time analysis of human motion posture with the assistance of an optical motion capture system.

eISSN:
2444-8656
Language:
English
Publication timeframe:
Volume Open
Journal Subjects:
Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics