Analysis of tennis techniques and tactics based on multiple linear regression model

In order to make up for the existing deficiencies in the analysis of the factors affecting the athletes' victory, scientifically and effectively judge the technical indicators that affect the athletes' victory


Introduction
Competition technology and tactics statistics is a cognitive activity that reflects the quantitative relationship and characteristics of various aspects and links of sports competition technology and tactics as well as the various components of the system by using specific data, it is a research method for systematic investigation of competition activities [1].In tennis, this method is often used to evaluate athletes' competitive level and guide training practice.However, the currently used statistical indicators have shortcomings of insufficient refinement and poor accuracy, and individual indicators with small correlation interfere with the accuracy of the results.In view of this, it is necessary to use statistical methods to select the indicators closely related to the results of the competition, and on this basis, establish a prediction equation of the results of the competition, which provides a reference for accurately analyzing the athletes' technical level and the construction of the follow-up evaluation model.As a multivariate statistical method, Logistic regression model has been widely used in different research fields.In the field of economics, the Logistic regression model can be used to establish a financial risk prediction model; In the medical field, models are often used to find the factors associated with disease occurrence [2]; In the field of sports health, the Logistic regression model is used to find out the potential causes that affect human health in daily activities.It can be seen that statistical methods can help us find scientific solutions to practical problems.Figure 1 is the regression analysis framework [3].

Research objects
The research object is the technical analysis and training enlightenment of tennis winning based on Logistic regression model.

1)
By searching the databases such as CNKI and Wanfang for research related to the analysis of winning factors of athletes, the literature method can grasp the direction and law of existing research.By visiting the official website of tennis, we can obtain the phase data of men's singles matches in 2021 [4].
2) The mathematical statistics method uses ExceL to make statistics on the technical and tactical indicators of ACE, double errors, goal rate of one shot, scoring rate of one shot, equal scoring rate of two shots, scoring rate in front of the net, break scoring rate, receiving scoring rate, winning points, and unforced errors published on the official website of the tennis open, by sorting out and analyzing and using SPSS21.0, the author uses the Logistic regression model to make a statistical analysis of the main technical indicators that affect the success of 2021 men's tennis singles.See Table 1 for statistics of winning variables of men's singles in 2021 [5].

Test of normality of variables
In order to ensure the accuracy of the regression model in predicting the outcome of men's tennis singles, the author adopts a multi-step analysis and test method, makes a normal distribution test on the 10 technical indicators selected from the 2021 men's tennis singles matches that affect the results of the men's tennis singles matches.The normality test refers to judging whether the population is subject to normal distribution by using observation data, the commonly used normality test methods are Shapirovick test, Kolmogorov test, skewness kurtosis test, etc.The author's research is subject to the final calculation results of Shapirovick test, and the calculation results are shown in Table 2 [6].

Kruskal Walli inspection
For variables that do not conform to the normal distribution after the normality test, perform the rank sum test (Kruskal Walli) for comparison of multiple independent samples, this test method is to use the rank sum of multiple samples to infer whether the positions of the whole represented by each sample are different.Through this method to test the target, we can judge the variable difference ability between the winning group and the losing group in 2021 men's singles tennis match, that is, the significance of each index affecting the final match results.See Table 3 for inspection results [9].
When the P value is less than 0.05, the difference between groups of variables is statistically significant; When the P value is greater than 0.05, the component difference of variables is not statistically significant.It can be seen from Table 3 that there are significant differences between the groups in variable double error, scoring rate of second round, scoring rate before the net and breaking scoring rate, that is, these four technical indicators can significantly affect the victory or defeat of men's tennis singles in 2021; The difference between groups of variable ACE and winning points is not significant, that is, these two technical indicators cannot significantly affect the victory or defeat of men's tennis singles in 2021 [10][11].

Independent sample t-test
Independent sample t-test is to compare the difference between the two groups of data for statistical significance, the premise of independent sample t-test is that the two groups of data meet the requirements of normal distribution.Therefore, this method is used to test the variable scoring rate of one shot, scoring rate of one shot, scoring rate of receiving shot, and unforced errors that conform to the normal distribution, in order to judge the significance of these four technical indicators on the outcome of men's tennis singles in 2021.See Table 4 for inspection results [12].It can be seen from Table 4 that the P value of the homogeneity test of variance is greater than 0.05, and the hypothesis of homogeneity of variance is valid.Therefore, the test results refer to the P value in the column of hypothesis that the variance is equal [13].When the variance is assumed to be equal, the P values of the first round scoring rate, receiving scoring rate and unforced error of the variables are all less than 0.05, which means that these three technical indicators have a significant impact on the results of men's singles tennis in 2021 [14]; The P value of the first round goal rate of the variable is greater than 0.05, which means that this technical indicator cannot have a significant impact on the outcome of the 2021 men's singles tennis match [15].
Through the above tests, it can be concluded that among all the 10 pre selected technical index variables, only double error, scoring rate of the first round, scoring rate of the second round, scoring rate in front of the net, breaking scoring rate, receiving scoring rate and unforced error are significant in distinguishing the victory or defeat of the 2021 men's singles tennis match, that is, they can have a significant impact on the victory or defeat of the game, so they can be cited as candidate variables in the subsequent analysis [16].

Correlation analysis of variables
In order to prevent the distortion of the model prediction results or the difficulty in estimating the accuracy of the model due to the accurate or highly correlated relationship between the explanatory variables in the regression model, the correlation test is performed on the selected candidate variables.The closer the absolute value of the correlation coefficient r is to 1, the higher the degree of correlation between variables, when the absolute value of r is greater than 0.8, there is a strong correlation between the two variables.The analysis shows that there is no strong correlation between the candidate variables, so the candidate variables are included in the next analysis [17].

Logistic regression model construction
The seven technical index variables screened through the above analysis are substituted into the binary regression model, and the dependent variable wins=1, minus=0, using the backward stepping method, finally, three technical indicators, namely the scoring rate of the first round, the scoring rate of the second round and the scoring rate of the receiving round, were selected to construct the final model, as shown in Table 5 [18].It can be seen from Figure 2 that the overall accuracy of the final model is 82.1%.Therefore, it can be determined that this model has a high accuracy rate in predicting the victory or defeat of 2021 men's tennis singles matches, and the regression model has a good prediction effect [20].

Conclusion
The three core technical indicators that affect the success of the 2021 US Open men's singles are the scoring rate of the first serve, the scoring rate of the second serve, and the scoring rate of the receiving serve, these three indicators are not only the ability to crack and restrict the opponent's defense technology, but also the key to break through the opponent's service advantage and win the game.The author applies Logistic regression analysis to tennis techniques and tactics, in order to ensure the accuracy of the regression model in predicting the outcome of men's singles, the author adopts a multi-step analysis and test method to test the normal distribution of 10 technical indicators selected from the 2021 men's tennis singles competition that affect the results of the men's tennis singles competition.Among the 10 pre selected variables, the P values of ACE, double error, second round scoring rate, online scoring rate, break scoring rate and winning point are all less than 0.05, these six variables do not obey the normal distribution, which means that these six technical variables of men's tennis singles winning have a large deviation from the average level.Therefore, only the variable first round goal rate, first round scoring rate, receiving scoring rate, and unforced errors obey the normal distribution, which means that the technical variables of winning the four men's tennis singles are less deviated from the average level.The seven selected technical index variables are used to replace the binary regression model, and the dependent variable is set to win=1, negative=0, and the backward stepping method is used, finally, three technical indicators were selected, namely, the scoring rate of the first engine, the scoring rate of the second engine and the scoring rate of the receiving engine, the overall accuracy of the model finally established to construct the final model was 82.1%.The higher the scoring rate of service reception, the stronger the tennis players' service reception ability and defensive counterattack ability, which also means that the players will become active attacks under the passive defense of service reception, which is the key for players to break through the opponent's service advantage and win the game.

Figure 2 .
Figure 2. Prediction accuracy of logistic winning regression model

Table 1 .
Statistics of Winning Variables in Men's Tennis Singles in 2021

Table 2 .
Statistics of Normality Test for Technical Indicators of Men's Tennis Singles Winning in 2021

Table 2
[7][8]les do not obey the normal distribution, which means that these six technical variables of men's tennis singles winning have a large deviation from the average level.Therefore, only the variable first round goal rate, first round scoring rate, receiving scoring rate, and unforced errors follow the normal distribution, which means that the technical variables of winning the four men's tennis singles are less deviated from the average level[7][8].
shows that among the 10 pre selected variables, the P values of ACE, double error, second round scoring rate, online scoring rate, break scoring rate and winning point are all less than 0.05, these six

Table 5 .
[19] backward analysis results of X4, X5 and X8By substituting the data in Table5into the regression equation, the final Logistic regression model is: Logit (P)=-20.800+0.120X4+0.053X5+0.236X8.Where, P is the probability that the player wins; X4 is the scoring rate of one shot; X5 is the scoring rate of the second round; X8 is the receiving score rate.The final Logistic regression model is used to test the goodness of fit of various data of men's tennis singles in 2021, the split point is 0.5, the prediction accuracy of the regression model is shown in Figure2[19].