Design and Implementation Strategy of Informative Training System for Tennis Physical Education

As an elegant sport, tennis has been gradually popularized in China’s colleges and universities, and is loved by college students. Various tennis matches are frequently held between colleges and universities as well as between departments within the schools, so the level of college tennis directly affects the viewing degree of the matches [1–3]. At the same time, the sports level of college tennis players also directly drives the popularization and development of tennis in colleges and universities [4–5]. With the rapid development of information technology, its application in the field of sports is more and more extensive. In tennis, information technology not only changes the traditional training mode, but also provides athletes with more scientific and efficient training methods [6–8].

In tennis training, data analysis technology can provide rich information support for coaches by collecting, organizing and analyzing athletes’ training data and game data, making the training process more accurate and efficient [9–10].

Virtual reality technology, also known as spiritual realm technology, mainly refers to a computer advanced human-computer interface technology that exhibits basic features such as immersion, interactivity, and conceptualization. This technology is an effective integration of computer technology, simulation technology, multimedia technology, artificial intelligence technology, microelectronics technology, sensing technology, etc. It can simulate the human sensory functions including vision, hearing, sensation, etc., so that a person can be immersed in the virtual environment created by the computer and interact with it in real time with the help of language, gestures, etc., creating an adapted multidimensional information space, which has a very broad application prospect. It has a very broad application prospect [11–12]. In tennis, the application of virtual reality technology not only provides a non-physical but highly realistic training environment for athletes, but also greatly improves the safety and efficiency of training [13–14].

Wearable devices are made to work by sensors worn on the human body and connecting the body activity information to other electronic devices through wireless transmission technology via a wireless network. As a new type of intelligent tool, wearable devices have the advantages of being small and portable, feature-rich, energy-saving and environmentally friendly. The application of smart wearable devices can monitor athletes’ physical condition and sports data in real time, provide detailed and accurate data support for coaches, and then help athletes improve their training effect and competition performance [15–18].

The content of this project is the research of tennis action recognition and evaluation system based on human skeletal keypoints, through the target detection algorithm based on OpenPose-bm to obtain the target detection frame of the human body, and the two-dimensional human posture estimation model is used to obtain the data of the human skeletal keypoints in the target detection frame and to detect the tennis action. Then, the acquired key point sequences are inputted into the AA-GCN action recognition model, and the tennis actions are classified into six categories, including serve, forehand, backhand, high pressure, chip, and net interception, and then the classified actions are evaluated using the improved dynamic time regularization algorithm, which establishes a hybrid model of attitude estimation with action recognition and action evaluation to evaluate the tennis actions of tennis students. The model can greatly reduce the workload and work difficulties of tennis teachers, effectively improve the education and quality of tennis teaching, and make tennis teaching more quantitative and automated.

2

Construction and processing of data sets

2.1

Skeletal keypoint extraction based on OpenPose

The aim of this subsection is to improve the OpenPose pose estimation model to obtain the 2D coordinate position and confidence level of the tester’s tennis action skeleton, and to process the skeleton information in order to make a foundation for recognition and classification by convolutional neural network in subsequent experiments, and ultimately to realize the recognition of the specification of the technical action or not. In this subsection, the 2D data coordinates of the key points of the skeleton are utilized to improve the accuracy of the skeletal data for action recognition due to the consideration of the advantages of skeletal data. In order to be able to achieve the effect of real-time recognition on this basis to improve it, in order to meet the needs of this experimental research on the application of tennis technical action recognition scene.

2.1.1

Model for human posture estimation

1)

OpenPose posture estimation model

OpenPose is an open source project in human posture modeling, which is suitable for single as well as multi-person scenarios, and has a broad application prospect in the fitness field and action recognition field. The model mainly detects the relationship between the joints of the human body to determine the range and direction of motion of each joint, so as to estimate human posture. The basic idea is to represent the human body as a vector composed of a series of local coordinate systems, and then use an optimization algorithm to find the optimal posture parameters.

Figure 1 shows the model structure of OpenPose. Its principle flow is to take an RGB image as the input of the system, and then extract features from the image with the VGG19 architecture of convolutional neural network to get the feature map F, and then the feature map F is processed by multi-layer convolutional processing to output the results as S_t and L_t respectively. Where S_t denotes the key points of the body, such as the head, shoulders, etc. L_t denotes the direction of the pixel point in the skeleton, such as the torso, limbs and other parts of the body. The output feature map F is fused with S_t and L_t after each stage. As the number of iterations increases, S_t can differentiate between the left and right parts of the body and finally outputs a 2D image with the locations of the key points of the body in the image.

2)

OpenPose-bm pose estimation model

The original OpenPose model needs to run a higher GPU, in order to further improve the real-time performance of detecting the posture, this paper modifies the original OpenPose model based on Daniil Osokin’s idea, and builds the OpenPose-bm posture estimation model in the development environment of TensorFlow, and extracts the skeletal information in the dataset through the constructed OpenPose-bm pose estimation model to extract the skeletal information in the dataset.

2.1.2

Skeletal point data preprocessing

In this subsection, the processed dataset is put into the designed human posture estimation model to obtain the coordinate position information of the human key points, which is a good foundation for the subsequent construction of the network to train the data so as to realize the action recognition. It mainly introduces the serial number labeling order of human key points, the processing of the data, including removing redundant key points, establishing a coordinate system, completing missing key points, and extracting features. 1)

Human body key point detection

The human body keypoints contain 18 skeletal keypoints of the human body, and the OpenPose-bm model studied in this paper adopts the form of calibration of skeletal keypoints in the CoCo dataset, including the parts of the 18 keypoints of the human body.

2)

Remove redundant key points

Since this paper studies the tennis technical movement classification, for movement classification, the main recognition parts are torso and limbs. Therefore, in this experiment it is necessary to remove the location of the 5 keypoints of the head output can be removed.

3)

Establishment of 2D coordinate system

According to the previous processing of the dataset, the 2D skeletal coordinate dataset is obtained by using OpenCV to convert the video data into 65 frames of RGB image sequence as the input of the model put into OpenPose-bm and saved to a txt file, and the position of each key point is represented by X and Y as the horizontal and vertical coordinates, respectively.

V(n), n=1,2,3,…,N are used to denote the n frame in a RGB video image with a resolution of 1080 in consecutive N frames, and the detected bone points in the V(n) are set to p, which are written as S(n, p), p=1,2,3,…, p. There are 13 joints in a skeleton S(n, p), of which the 2D position of the t joint is specified by the formula: 1 $X^{t} (n, p) : = (x^{t} (n, p), y^{t} (n, p) : x^{t} (n, p) \in [0, W], y^{t} (n, p) \in [0, H])$

4)

Completing the lost key points

Since the dataset is captured by camera, there will be environmental noise, cluttered objects occlusion, shooting perspective occlusion, etc. resulting in the loss of some key points of data, which can not obtain the complete skeleton information and affect the action evaluation, to address this problem need to design algorithms to supplement the missing key points to fill in the gaps. In this paper, the missing keypoints are filled according to the difference between the front and back frames of the missing keypoints, and the horizontal and vertical coordinates are calculated as: 2 $x_{s_{c u r r}} = x_{s_{p r e}} + d (\frac{x_{s_{p r e}} - x_{s_{l a s}}}{e})$ 3 $y_{s_{c u r r}} = y_{s_{p r e}} + d (\frac{y_{s_{p r e}} - y_{s_{l a s}}}{e})$

Where x_scurr and y_scurr represent the real-time horizontal and vertical coordinates of keypoint s, and x_spre and y_spre represent the horizontal and vertical coordinates of the frame before the vacancy of keypoint s, respectively. x_slas and y_slas represent the horizontal and vertical coordinates of the frame before the end of the vacancy of keypoint s, d represents the position of the keypoint at the requested vacancy location, and e represents the total number of vacant frames.

The movement action is continuous, so the filled data is closer to the real skeleton information characteristics and conforms to the movement pattern.

2.1.3

Feature extraction

After processing the data of the skeleton, the data need to be feature extracted in order to be put into the classification model to be trained, and generally the commonly used feature extraction methods are neck reference method, pose to angle conversion method and coordinate normalization method. In this paper, the coordinate normalization method is used for feature extraction.

This method converts the coordinate system by taking the upper-left point of the skeleton bounding box in the original image as the coordinate origin, with the right side of the upper-left point as the horizontal coordinate and the bottom side as the vertical coordinate. Since the position of the coordinates can change due to factors such as the proximity of the camera and the position of the human body in the image, resulting in inconsistent coordinates, the coordinate normalization method is used to unify the position of the coordinates in the image so that this data can be put into the subsequent classification network for action recognition.

After coordinate normalization, the formula for calculating the horizontal and vertical coordinate positions of the corresponding key points in the newly established coordinate system is shown in the following equation: 4 $x_{s_{c u r r}} = x_{s_{o r i g}} + e$ 5 $y_{s_{c u r r}} = y_{s_{o r i g}} + d$

Where e represents the unit of translation of the coordinate origin in the original image on the x - axis and d represents the unit of translation on the y -axis.

2.2

Overall network design

In this paper, we propose a skeleton action recognition network AA-GCN that combines second-order skeletal and motion information, attention mechanism and channel topology refinement module with SGN as a baseline network, and the network structure is shown in Fig. 2 [19]. Its are skeletal-motion module, topology refinement map convolutional feature fusion module CTFC-GCN, spatio-temporal feature augmented attention module ST-ATT and temporal flow module.

The Bone-Motion module is a data processing module that processes the raw joint information into second-order bone information (Bone) and motion flow information (Velocity), and converts the 3-channel bone information and motion flow information into 64 channels using a two-layer convolutional network. The summed and fused features and the high-level features encoded on the original joints are subjected to the connectivity (CAT) operation and fed into the three-layer channel topology refinement map convolution CTR-GC for channel learning. And after each CTR-GC the features of 1X1 and 1X3 convolutional aggregation channels are added to form the CTFC-GCN module. After the spatial flow module, the ST-ATT attention mechanism is incorporated to model the three-layer channel topology to refine the location and temporal features learned by the graph convolution, to strengthen the channel representation of the joint point features, and to extract more information for the feature learning in the temporal flow module. In the temporal flow module, the encoding of temporal features is first incorporated, and global maximum pooling is utilized for global temporal feature extraction, leaving only temporal features remaining, which are fed into a two-layer convolutional network for temporal feature learning. Finally, global maximum pooling is performed again and the channel features learned are used for human action recognition using the fully connected layer.

2.3

Network Architecture Improvements

2.3.1

Skeletal-motor module

Based on previous studies, the preprocessing of data is mostly done using a single joint data or training the network separately with multiple input streams, which is costly to improve the accuracy [20]. In this paper, for modeling second-order features, two-layer 2D convolution is first used to extract the features and convert the number of channels, and for the advanced features that have been extracted, they are then summed and fused.

The second-order skeletal information and second-order motion information of the skeleton data are obtained by assuming the initial joint coordinates X = {x ∈ Rn,c,t,v} of the input network, where N, C,T, V and, represent the size of the input batch batch_size, the number of channels of the input, the number of frames of the input, and the number of 3D joint points of the input, respectively: 6 $B_{i 1} = x [:, :, i_{1}, :] - x [:, :, i_{2}, :]$ 7 $V = x [:, :, :, i_{1}, :] - x [:, :, i_{2}, :]$ where i₁, i₂ represent the joint keypoints respectively.

Since the last dimension is the joint point dimension, the motion information is mainly obtained from the second frame to the last frame, subtracting the joint point position from the first frame to the last frame, so after the above equation, it is necessary to add 0 at the first frame (i.e., the motion information of the first frame is 0) to get the complete motion information.

2.3.2

Topology Refinement Map Convolutional Feature Fusion Module

Channel Topology Graph Convolution (CTR-GC) is shown in Fig. 3 and is mainly divided into three parts: channel refinement topology model, feature conversion and channel refinement aggregation module. In the channel topology refinement model, the input features are first converted, using the Φ and Ψ functions for the conversion of the channel dimensions, and the M denotes the activation of the converted features of the two functions to become the matrix of the N×N, which models the correlation between the learned joints. In this paper, three topologically non-shared A matrices are constructed, which are the self-connections in different frames for each joint, the connections between different joints and the center joint in the same frame, and the connected joints in the same frame The R matrices are obtained from the following equation, where α is set as a trainable parameter: 8 $R = A + α + Q$

The feature conversion section converts the input features into high-level features by 2D convolution. In the channel aggregation section, the R matrix of each channel is computed with the channel converted features using the enisum function i.e. Einstein summation operation to get the final output.

The structure of CTFC-GCN is shown in Fig. 4, which consists of the three topologies of CTR-GC and 1 × 1 and 3 × 3 feature aggregation convolution modules as described above. And the residual connection is added to make the training of this recognition network more stable.

2.3.3

Temporal Attention Module

The ST-Joint Attention in this paper extracts the features by two-dimensional convolution for joint dimension and time dimension respectively, obtaining different weights for each channel with the formulas respectively: 9 $x_t_a t t = W_{2} (r e l u (B N (W_{1} V_{1} + b_{1}))) + b_{2}$ 10 $x_v_a t t = W_{2} (r e l u (B N (W_{1} V_{2} + b_{1}))) + b_{2}$ where V₁, V₂ are the matrices of N × C × T, and N × C ×V obtained by compressing the joint dimension and time dimension, respectively. 11 $x_a t t = x_t_a t t \times x_v_a t t$

The attention weight of ST-Joint Attention can be obtained by the above equation.

2.4

Overall process

The model in this paper is developed based on OpenPose-bm posture estimation model, skeleton action recognition network AA-GCN, CTFC-GCN structure, three parts, in order to complete the realization of the tennis action recognition process, it is necessary to configure the algorithmic dependency libraries under the same Linux environment, to build a complete semantic skeleton information acquisition and skeleton action recognition environment [21]. Unification of the three algorithms environment, this paper is configured in Ubuntu18.04, python3.7, GTX3070Ti GPU environment, the corresponding CUDA10.0 is installed, the use of appropriate version of the CUDNN to accelerate the deep learning, based on the implementation of action recognition, combined with the libraries of opencv and ffmpeg, etc. to complete the preservation and visualization of video data The video data is saved and visualized by combining libraries like opencv and ffmpeg.

The overall process of skeleton action recognition combining semantic information is shown in Fig. 5. The input of the model is RGB video stream, and the OpenPose-bm algorithm is used for human posture estimation, which contains the total number of frames and the 2D coordinates of 13 joints (5 redundant keypoints are removed) in each frame in a lightweight json file. Based on the a priori information, the connectivity relationship of the joints is constructed, and the training validation dataset is divided and fed into the AA-GCN skeleton network for action recognition. For the gradient explosion and disappearance that occurs during the data training process, the model is difficult to fit, and the BN layer is used to normalize the inputs, and the RELU is nonlinearly trained to the parameter. For the skeleton level is difficult to distinguish between the action one-handed and two-handed backhand, by combining the target detection normalized threshold decision-making judgment to improve the accuracy rate.

This paper focuses on three types of actions in hitting the ball: forehand hit (Force hand), one-handed backhand (Back hand1) and two-handed backhand (Back hand2), identifying the output and visualizing the video frames and action categories of the skeleton stream.

3

Tennis sports informatization training system design

Based on the previous research on human action recognition in sports scenarios, this chapter designs and implements a tennis action recognition analysis system based on video-based human estimation to evaluate the performance of the method proposed in this paper in practical engineering applications.

The core model design of the system and the implementation of the system process are mainly introduced. The system can process the input video of tennis movements, recognize the tennis movements in the video and analyze and evaluate them, give the corresponding scores of the movements as well as the proposals for movement improvement, which can be used to improve the technical level of tennis players, improve the training effect and evaluate the individual performance.

3.1

System architecture

According to the requirement analysis and the dependency relationship between technical solutions, the functional modules related to the guidance of tennis action recognition and analysis in the system architecture are introduced, including the following:

1)

Video Input Module

This module is mainly used to collect video data of tennis players, which can be directly inputted by the video stream transmitted by the camera or the video collected by other video collection devices. The collected video needs to have a certain resolution and frame rate to ensure the accuracy of the subsequent analysis. For the input video segment, the required duration must ensure that the tennis sub-movement is carried out completely at least once.

2)

Target detection and tracking module

The main function of this module is to detect the characters in the screen and maintain the consistency of the current target character in the continuous video frames, to prevent the sequence of human posture points obtained subsequently from being interfered by non-target people or objects, and to ensure the continuity of the target character’s movements.

3)

Human Body Posture Estimation Module

This module is mainly used to estimate the human body posture in the video, and the human body posture estimation method OpenPose-bm proposed in subsection 2.1 is used in the system for posture estimation. This module can output and save the coordinate information of human skeleton pose points for subsequent action recognition analysis.

4)

Tennis action recognition module

This module is mainly used to recognize and analyze the movements of tennis players in the video, and the system uses the skeleton movement recognition network AA-GCN method proposed in subsection 2.2 for recognition, which classifies and recognizes the movements of tennis players based on the coordinate information of the key points of the human skeleton.

5)

Data storage and visualization module

This module is mainly used to persist the analysis results and display the identified movement types, movement scores, points where the movement needs to be improved and the frequency of occurrence of the movement types on the screen in a visualized way.

Figure 6 illustrates the overall design diagram of the system based on the overall system architecture, in which the input video in the video input module contains two categories, one for the input standard tennis movements and one for the student movements to be detected and evaluated. The data storage and visualization module mainly stores the input video data, the movement posture information of the characters, and the results related to movement analysis and evaluation.

3.2

System functions

3.2.1

Tennis Motion Recognition Module

In order to realize tennis sports action recognition in open scenes, the tennis action recognition evaluation module uses the target detection algorithm Faster R-CNN and the target tracking algorithm ByteTrack to ensure the consistency of the target figure during the action duration. Then, OpenPose-bm, a human pose estimation method proposed in Chapter 3, is used to extract the skeletal pose point features of the target character to obtain the sequence of skeletal pose points of the target character in the video. Finally, an action classification algorithm is used to calculate the action similarity (DTW) and is combined with a real-time action recognition method based on manual rules to provide recognition scoring guidance for tennis sports actions.

3.2.2

Tennis movement training instruction module

After recognizing the tennis movements such as serve, forehand, backhand, chip and high-pressure, we compare the postural norms of the target characters in each frame of the screen and provide targeted guidance. The action points were converted into the angular relationship between the skeletal points of human posture in the frame, and the manual rules of fine-grained tennis actions were formed, and the angles of the relevant posture points of the target characters were compared one by one according to the angle rules. Due to the very short duration of the fine-grained action (2-5 seconds), and in the finer level of action elements in the gesture feature division more obvious distinction is very high. In the system, KNN is used to categorize what kind of action element the current input frame belongs to, and each action element corresponds to the angle calculation of the posture point composition in the rule. Combined with the actual movement of tennis, the system uses the vector angle calculation formula to calculate the angle between the posture points.

Fig. 7 shows the schematic diagram of the calculation of the angular features of the target character where the gesture points on the left arm of the human body contain A(x_i, y_i), B(x_i, y_i), and B(x_k, y_k) three gesture points representing the left shoulder, the left wrist, and the right wrist, respectively, and these three gesture points constitute the angle of action of the left hand θ, which is calculated in the following way: 12 $\cos θ = \frac{\vec{B A} * \vec{B C}}{| B A | | B C |}$

Where x_i is the horizontal coordinate, y_i is the vertical coordinate, subscripts i, j, k are the different skeletal stance points, $\vec{B A} * \vec{B C}$ is the vector multiplication and |BA| is the modulus of the vector. Based on the tennis movement essentials and so on, through the above formula, can be calculated to obtain the human body’s left arm, right arm, lower limbs, as well as other angles of the limbs of the angle.

The summary of tennis movement essentials is mainly derived from the tennis-related scientific training counseling materials made public by the Tennis Center of the General Administration of Sport, as well as the Chinese youth tennis training syllabus.

3.2.3

Data storage and visualization modules

After the tennis action recognition, according to the real-time acquired tennis video data, real-time storage of the recognized human body posture, video corresponding action category, action score and action rule evaluation details. Store the video-generated human posture point data, as well as the action evaluation details, in a MySQL database to complete the persistence of the human posture action data.

In order to facilitate the real-time discovery of their own posture problems against the video, the system will identify the type of sub-movement, the movement score, whether the movement is standard or not, and the details of the rules corresponding to the current movement are displayed in real-time in the corresponding movement video through the form of screen rendering. Through the deployment of the project and the feedback of the actual use of the system has proved that the system can effectively reduce the workload and work difficulty of tennis coaches, improve the education and quality of tennis teaching, and the teaching of tennis is more quantitative and scientific.

4

Results and analysis of the application of the information-based training system

4.1

Application of Informationized Training System for Tennis Sports

In this experiment, the main focus was to test the change in hand pressure during forehand strokes and to be able to test a wider audience. A high-level tennis player was used as the standard test subject for the experiment. The experimental procedure is as follows: 1)

Fix the arrayed flexible sensor to the surface of the racket grip and wear the data acquisition circuit on the tester.

2)

Fix the IMU under the neck of tennis racket.

3)

Turn on the upper computer to display the pressure curve of five points and the three-axis acceleration curve of the racket in real time.

4)

Players get ready for the posture, the hitting hand does not touch the racket first, and the whole maintains a relatively stable posture and lasts for about 5s, which is convenient for the later synchronization of the data information of the two.

5)

Continuously hit the ball several times and observe the pressure curve and acceleration curve in the upper computer.

6)

After hitting the ball, save the corresponding pressure data and acceleration data.

The three-axis acceleration waveforms of the racket during the hitting process are displayed in real time on the host computer as shown in Figure 8. Where the negative value indicates the acceleration in the opposite direction, it can be seen that high-level tennis players can reach a maximum of 9.35, 8.01 and 11.87 m/s² in the X, Y and Z axes, respectively, during the process of hitting the ball.

Figure 9 shows the three-axis acceleration curve for a forehand stroke. The number of strokes can be calculated by counting the number of peaks in the three-axis (X, Y, Z) acceleration curve. By calculating the peaks, it can be concluded that the number of strokes during this stroke is 25. Analyzing the overall three-axis acceleration waveforms shows that the correct hitting action produces stable three-axis acceleration waveforms with large peaks. Through specific analysis, it has been found that the distribution ratio of positive and negative values for each axis acceleration is relatively fixed. For the Y-axis acceleration, the negative part of the waveform is larger than the positive part, which is presumed to be due to the fact that in the lead-in phase, the racket’s movement distance in the Y-axis direction is short, thus generating a small swing acceleration. In the acceleration phase, the racket moves further, which results in a larger swing acceleration. For the Z-axis acceleration, the negative part of the waveform is also slightly larger than the positive part, which is presumed to be due to the fact that the racquet moves a longer distance in the horizontal direction during the acceleration phase than during the lead-in phase, which generates a larger corresponding acceleration. The above speculation is based on the actual situation of human movement. Therefore, the racket needs to move a shorter distance and maintain a smaller acceleration in the Y-axis direction in the lead-in phase, while moving a longer distance and generating more speed in the acceleration phase. During the acceleration phase in the horizontal direction, the racket needs to move forward a longer distance to generate more acceleration and produce a better stroke.

Figure 10 shows the triaxial acceleration curves and pressure curves during a single forehand stroke, and the forehand process can be divided into four phases (back swing lead, acceleration, stroke and follow-through swing). By analyzing the pressure curves at the five points, the following can be derived for the correct power delivery of a tennis stroke: 1)

In the back swing lead phase (Stage 1), the athlete pulls the racket behind the body. At the beginning of this stage, the pressure at the key position points is low due to the relaxed state of the hand. In the later stages, the racket is gradually raised and the hand pressure slowly increases.

2)

In the acceleration phase (Stage 2), the racket moves downward in a direction perpendicular to the ground, then upward, touching the ball during the upward movement. In the direction parallel to the ground, the racket moves from back to front, and the trajectory of the whole acceleration stage is similar to an inverted “C” shape. During the acceleration phase, as the swing speed increases, the racket is gradually gripped tighter and the pressure on the hand increases.

3)

At the very beginning of the stroke (Stage 3), a moment before the racket touches the ball, the pressure on the hand is maximized in preparation for hitting the ball. When the racket vibrates during the stroke, there is a momentary loosening between the hand and the racket, resulting in a decrease and then an increase in pressure.

4)

In the follow-through phase (Stage 4), the ball leaves the racket, but there is inertia after the racket moves at high speed, and the racket continues to move forward and eventually moves to the left side of the body. As the hand slowly relaxes after hitting the ball, the pressure gradually decreases.

According to the pressure curve, it can be found that the athlete’s force generation is in line with the standard law of force generation, the pressure reaches the maximum value at the very beginning of the hitting phase, and the pressure is at a lower value in other phases, which is analyzed comprehensively that the athlete belongs to the advanced level tennis players.

4.2

Analysis of the experimental results of the tennis action recognition model

Confusion matrix is a matrix used to measure the performance of a classification model. It shows the difference between the model’s predicted results and the true results under different classification labels. Confusion matrix is usually used in binary or multiclassification problems, and it can help this paper to calculate the precision, recall, F1 score, and other metrics of the model in order to evaluate the performance of the model.

Under the condition that the target detection model is AA-GCN and the human posture estimation model is OpenPose-bm, in order to reflect the effectiveness of the fine-grained processing of the tennis human skeletal key point dataset, at the same time, the human skeletal key point dataset that has not been processed by fine-grained processing is inputted into the ST-GCN, AGCN, and AA-GCN models for training. Comparison experiments were conducted to test the effectiveness of three algorithms in tennis movement classification, using a test set of six types of basic tennis technical movements. Among them, 30 videos for each type of action, totaling 180 tennis action videos. The test results of the three skeletal keypoint-based action recognition models without using fine-grained datasets are shown in Fig. 11 and Tables 1~3. Among them, A-F refer to the six actions of serve, forehand stroke, backhand stroke, high-pressure shot, chip and net interception, respectively. It can be seen that the average accuracies of the human skeletal key point dataset without fine-grained processing input into the ST-GCN, AGCN, and AA-GCN models are 0.6833, 0.7056, and 0.7222, respectively.

Table 1.

Test results of ST-GCN model without fine-grained data sets

Category	True sample	Prediction sample	Correct classification	Accuracy rate	Precision rate	Recall rate	F1-Score
A	30	20	20	0.6833	0.6667	1.0000	0.8000
B	30	36	21		0.7000	0.5833	0.6363
C	30	34	21		0.7000	0.6176	0.6562
D	30	28	24		0.8000	0.8571	0.8276
E	30	35	18		0.6000	0.5143	0.5539
F	30	27	19		0.6333	0.7037	0.6666
Total	180	180	123		-	-	-

Table 2.

Results of AGCN model tests without fine-grained data sets

Category	True sample	Prediction sample	Correct classification	Accuracy rate	Precision rate	Recall rate	F1-Score
A	30	27	18	0.7056	0.6000	0.6667	0.6316
B	30	34	22		0.7333	0.6471	0.6875
C	30	33	24		0.8000	0.7273	0.7619
D	30	26	20		0.6667	0.7692	0.7143
E	30	35	22		0.7333	0.6286	0.6769
F	30	25	21		0.7000	0.8400	0.7636
Total	180	180	127		-	-	-

Table 3.

Results of AA-GCN model tests without fine-grained data sets

Category	True sample	Prediction sample	Correct classification	Accuracy rate	Precision rate	Recall rate	F1-Score
A	30	33	21	0.7222	0.7000	0.6364	0.6667
B	30	35	27		0.9000	0.7714	0.8308
C	30	31	21		0.7000	0.6774	0.6885
D	30	25	19		0.6333	0.7600	0.6909
E	30	31	26		0.8667	0.8387	0.8525
F	30	25	16		0.5333	0.6400	0.5818
Total	180	180	130		-	-	-

The test results of the three skeletal keypoint-based action classification models using fine-grained datasets are shown in Figure 12 and Tables 4~6. By summarizing and analyzing the experimental results, this paper finds that for the three skeletal keypoint-based action recognition models without fine-grained processing of the dataset the accuracy, precision and recall performance of the three models are low, indicating that there are some difficulties for the models in dealing with the coarse-grained tennis sub-actions. By using the fine-grained skeletal keypoint dataset of tennis sub-movements, the accuracy, precision, recall, and F1 values of the three models have been greatly improved. In the case of the AA-GCN model, for example, the accuracy of the model increased from 0.7222 to 0.8889 with the use of the fine-grained dataset. Classification accuracy increased from 0.7000 to 0.9333 for tennis serve action (A), from 0.9000 to 0.9333 for tennis forehand stroke (B), from 0.7000 to 0.9333 for tennis backhand stroke (C). From 0.6333 to 0.8333 for tennis high-pressure shot (D). For tennis chipping (E), the classification accuracy decreased from 0.8667 to 0.8000, and for tennis netting (F), the classification accuracy increased from 0.5333 to 0.9000. Recall and F1 values, too, improved significantly. The reason for the decrease in classification accuracy for tennis chipping (E) is speculated to be due to a dynamic blurring problem in the video frames, but it does not affect the overall results.

Table 4.

Test results of ST-GCN model using fine-grained data sets

Category	True sample	Prediction sample	Correct classification	Accuracy rate	Precision rate	Recall rate	F1-Score
A	30	33	25	0.7944	0.8333	0.7576	0.7936
B	30	30	23		0.7667	0.7667	0.7667
C	30	33	26		0.8667	0.7879	0.8254
D	30	27	21		0.7000	0.7778	0.7369
E	30	29	24		0.8000	0.8276	0.8136
F	30	28	24		0.8000	0.8571	0.8276
Total	180	180	143		-	-	-

Table 5.

Results of AGCN model tests using fine-grained data sets

Category	True sample	Prediction sample	Correct classification	Accuracy rate	Precision rate	Recall rate	F1-Score
A	30	33	26	0.8278	0.8667	0.7879	0.8254
B	30	32	25		0.8333	0.7813	0.8065
C	30	31	26		0.8667	0.8387	0.8525
D	30	27	25		0.8333	0.9259	0.8772
E	30	28	21		0.7000	0.7500	0.7241
F	30	29	26		0.8667	0.8966	0.8814
Total	180	180	149		-	-	-

Table 6.

Test results of AA-GCN model using fine-grained data sets

Category	True sample	Prediction sample	Correct classification	Accuracy rate	Precision rate	Recall rate	F1-Score
A	30	32	28	0.8889	0.9333	0.8750	0.9032
B	30	32	28		0.9333	0.8750	0.9032
C	30	32	28		0.9333	0.8750	0.9032
D	30	27	25		0.8333	0.9259	0.8772
E	30	28	24		0.8000	0.8571	0.8276
F	30	29	27		0.9000	0.9310	0.9152
Total	180	180	160		-	-	-

In summary, the accuracy of AA-GCN model based on the fine-grained dataset of tennis sub-movements is 0.8889, which is significantly higher than that of ST-GCN model of 0.7944, as well as that of AGCN model of 0.8278. Therefore, in this paper, we use AA-GCN model based on the fine-grained dataset of tennis sub-movements as the main backbone network of the tennis movement classification model, and the recognition effect is better.

4.3

Analysis of the effect of the application of the system in tennis teaching

This paper takes the physical education students of 2021 in X College as the experimental subjects. 32 students in the first class of physical education, including 28 male and 4 female students, are taken as the experimental subjects. There are 32 students in the second class of physical education, of which 29 are male and 3 are female. The total number of 64 students in the two classes were used as the experimental subjects for the teaching experiment of this study. Through the results of the pre-experimental test, it was found that there was no significant difference between the physical fitness level, the basic tennis skill level, the level of tennis technical movement and the level of college students’ cooperation ability of the students in the two classes, so that follow-up experiments could be carried out. The control group uses the traditional teaching method, while the experimental group is taught using the traditional teaching method supplemented by the informationized training system designed in this paper. The two classes conducted a 15-week, 30-credit-hour tennis teaching experiment in the same period of time. Through this teaching experiment, the effectiveness of the informationized training system designed in this paper is tested in college tennis teaching, and its impact on students’ learning of tennis technical movements is verified.

4.3.1

Comparative analysis of the performance of the control group in the tennis skills assessment

Through 15 weeks and 30 hours of teaching, the data were analyzed by paired samples t-test, and the six techniques of tennis of the control group before and after the experiment, namely, serve (A), forehand stroke (B), backhand stroke (C), high-pressure ball (D), chipping (E), and interception in front of the net (F), were compared and analyzed, and the details are shown in Table 7. The data showed that the scores of forehand serve, backhand stroke, backhand stroke and chipping in the six basic technical assessment indexes of tennis in the control group before and after the experiment were significantly different at P<0.05. There was no significant difference in the scores of high-pressure ball and net interception with P > 0.05. The data indicate that the traditional teaching method still has some advantages in students’ acquisition of tennis skills and mastery of techniques, but the teaching effect is more general.

Table 7.

Control group tennis six technical assessment test results T test (M±SD)

Test item	Pre-test	Post-test	T	P
A	43.50±13.85	44.87±11.34	-2.734	0.031*
B	47.06±10.52	48.57±10.07	-2.856	0.042*
C	41.33±9.06	42.84±10.44	-3.117	0.035*
D	42.25±9.73	43.38±9.86	-2.852	0.153
E	45.31±10.67	46.33±10.89	-3.007	0.027*
F	43.06±10.58	43.92±9.81	-2.537	0.055

4.3.2

Comparative analysis of the performance of the experimental group in the tennis skill test

With the assistance of the information-based training system, the experimental group was taught through 15 weeks and 30 hours of teaching, and the data were analyzed through the independent samples t-test, and the six tennis techniques of the experimental group before and after the experiment were compared and analyzed, and the details are shown in Table 8. The data show that before and after the experiment, the experimental group tennis six basic technical movement evaluation assessment index are P<0.001, with extremely significant differences. Analyzing the reason for this phenomenon, through a semester of study, the level of students’ tennis technical movements, compared with the lower level at the beginning of the semester when they did not receive systematic learning, has increased dramatically. With the assistance of the information-based training system, the progression of technical mastery of students at all levels of ability improved, so the experimental group of tennis six technical movement evaluation before and after the test, the progress is obvious.

Table 8.

Experimental group badminton technical movement evaluation test results T-test (M±SD)

Test item	Pre-test	Post-test	T
A	43.89±12.12	50.18±13.06	-4.835
B	46.55±11.03	51.33±11.84	-5.966
C	41.73±9.82	49.37±12.66	-6.308
D	42.06±9.66	48.42±13.75	-4.342
E	45.14±10.01	52.07±15.17	-6.121
F	43.27±10.12	50.34±12.76	-5.384

4.3.3

Comparative analysis of the technical assessment scores of the control group and the experimental group

The data were analyzed by independent samples t-test, and the six techniques of tennis in the control group and the experimental group were compared and analyzed after the experiment, as shown in Table 9. The data show that after the experimental results, the control group and the experimental group showed significant differences (P<0.001) in the six basic technical assessment indexes of tennis.

Table 9.

Comparison of technical assessment results between the two groups T-test (M±SD)

Test item	Control group	Experimental group	T
A	44.87±11.34	50.18±13.06	-3.951
B	48.57±10.07	51.33±11.84	-4.872
C	42.84±10.44	49.37±12.66	-5.334
D	43.38±9.86	48.42±13.75	-5.007
E	46.33±10.89	52.07±15.17	-5.671
F	43.92±9.81	50.34±12.76	-4.021

Analyzing the data, there are three main reasons for the above situation:

Firstly, students in the experimental group practiced with the assistance of the informationized training system, which could effectively and quickly correct mistakes and guide them, and the teachers also set different standards for students at different levels, which put the students in a good state of learning and greatly promoted their passion for learning, which was conducive to the acquisition of skills.

Secondly, the average scores of the six basic tennis skills of the experimental group are higher than those of the control group, which indicates that the informationized training system can effectively help students learn and practice better in the learning of basic tennis skills.

Thirdly, the six techniques of tennis serve, forehand stroke, backhand stroke, high pressure, chipping and net interception have been significantly improved, which can show that the informationized training system is suitable for the learning and practicing of tennis techniques, which can not only ensure the efficiency of students’ tennis technique learning, but also improve the quality of students’ tennis technique practicing.

5

Conclusion

In this paper, a method for recognition and evaluation of tennis actions based on skeletal keypoints is proposed. This successfully realizes a hybrid model of posture estimation, tennis action recognition, and tennis scoring methods. Firstly, a six-category tennis action data set was taken in a professional tennis court with two categories of people: tennis coaches and trainees, and the data set contained six categories of basic tennis technical actions, including serve, forehand stroke, backhand stroke, high-pressure shot, chip, and interception in front of the net. Using the AA-GCN-based attitude estimation method to obtain the skeletal key points of the human body, the coordinate system reconstruction of the tennis action key point dataset was carried out, and the parameters such as the center of gravity of the human body, the velocity of the key points and the acceleration of the key points were computed based on the obtained skeletal key points of the human body, which establishes the data basis for the subsequent classification of the tennis actions and the evaluation of the tennis actions. The robustness and accuracy of the model in classifying fine-grained and coarse-grained data sets of tennis sub-movements were analyzed by establishing a confusion matrix. It is found that the accuracy of AA-GCN model based on the fine-grained dataset of tennis sub-movements is 0.8889, which is significantly higher than that of 0.7944 of ST-GCN model and 0.8278 of AGCN model, and the recognition effect is better. By establishing the mapping relationship between the scoring of tennis coaches and the evaluation method, it is able to simulate the evaluation of tennis movements by professional tennis coaches, and effectively improve the efficiency and quality of tennis training.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Biologie, Biologie, andere, Mathematik, Angewandte Mathematik, Mathematik, Allgemeines, Physik, Physik, andere

Zeitschrift RSS Feed

Design and Implementation Strategy of Informative Training System for Tennis Physical Education

Siqi Mi

Online veröffentlicht: 19. März 2025

Eingereicht: 24. Okt. 2024

Akzeptiert: 31. Jan. 2025

DOI: https://doi.org/10.2478/amns-2025-0486

SchlüsselwörterPose estimation, Action recognition, Informative training system, OpenPose-bm, AA-GCN

© 2025 Siqi Mi, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Schlüsselwörter
Pose estimation, Action recognition, Informative training system, OpenPose-bm, AA-GCN