At present, the methods for predicting college sports performance mainly include time series models, empirical models based on the principles of econometrics, and neural network models. Among them, the regression analysis model based on the principles of econometrics can comprehensively analyse the influencing factors of college sports performance and provide a basis for quantitative prediction of college sports performance, and it is also the most preferred by the majority of scholars. The author's research found that the predictive model proposed by purely using economic principles may be suitable for one or several schools, but it is unfounded to apply it to all schools, and the predictive results are often unsatisfactory [1]. To this end, this research proposes a prediction method that combines genetic algorithm (GA), college sports performance evaluation and regression analysis. Based on GA, it dynamically optimises college sports performance and realises supervised evaluation. On this basis, a predictive model of college sports performance is established based on regression analysis. The calculation results show that the model is a high-precision prediction method.
The research on the prediction model of college sports performance can be summarized as time series prediction model, empirical model and neural network model, which is based on the school's nature and economic level and other factors to study the impact of the school's performance in college sports competitions. In subsequent research cases, Bernard and Busse proposed to use the Cobb-Douglas production function to establish a multivariate nonlinear model of the superior number division:
Where:
In the formula,
In the formula:
From model (3), it can be found that the evaluation of students’ college sports performance is the focus and difficulty of the prediction research. However, the existing evaluations of college sports performance are all unsupervised clustering methods. The disadvantages of this method are based on what data set is used as the cluster analysis, which cluster analysis method to choose, whether the evaluation of outstanding scores and the number of merits is equal, and clustering. It is very difficult to determine the number of classes, and can only be determined based on empirical estimation. These subjective estimation methods will inevitably lead to a decrease in the accuracy of the algorithm. Taking into account the shortcomings of the above cases, the author considers using GA to supervise and evaluate college sports performance. GA transforms the objective function into a genome group, takes the fitness function as the optimisation goal, and obtains the next-generation optimised gene combination through genetic manipulation, and so on until the optimal convergence goal is met [2].
An important reason why GA can be widely used is its global convergence. Due to the diversity of the GA group, it searches in all directions as much as possible. This is a great improvement over the previous gradient method that only searches in one direction. Moreover, GA does not need to have continuity and differentiability restrictions on optimisation problems. In the end, the dynamic optimisation of college sports performance evaluation can be realised. On this basis, prediction is made based on the multivariate nonlinear model of college sports performance, ensuring high prediction accuracy and strong objectivity. The prediction model process based on GA optimised college sports performance evaluation proposed in this research is shown in Figure 1:
GA uses goodness-of-fit R2 to evaluate the performance and prediction accuracy of college sports performance evaluation and converts this objective function into a fitness function. The algorithm starts by randomly generating a group. Each group of chromosomes in the group represents the student's college sports performance level. Each group of chromosomes is evaluated according to the fitness function, and the corresponding fitness value is obtained. The greater the fitness of the chromosome, the more the representative college sports performance evaluation has been optimised and the prediction effect is better. According to the fitness value, the probability of each chromosome being selected in the selection operation can be calculated. According to the selection probability, a random traversal sampling method is used to select a group of chromosomes to form a new population. According to the crossover probability, the chromosome is selected for GA crossover operation, and finally, according to the mutation probability, the mutation operation is performed on some of the gene positions on the chromosome. This operation makes the college sports performance grade set represented by the chromosome diversity in the entire search process and has a great played an optimisation role, thereby ensuring that the optimal solution can be found. The end condition of the algorithm is to set a maximum number of iterations, epochal, to ensure that the solution obtained by GA after the end condition is reached is the optimal solution [3].
Coding is the prerequisite for GA to solve the problem. This study uses integer coding for college sports performance grades. Before chromosome coding, first of all, the range [
Each chromosome represents a set of students’ college sports performance grades. The length of the chromo-some is the number of students’ homes. The genes in the chromosomes represent the college sports performance grades, and the same genes indicate that the college sports performance grades are of the same category. Take an integer k in the value range of C, which means that the students in the set contain k college sports performance levels. The chromosome can be expressed as: [
For example, in this study,
According to the code of the chromosome, this code is converted into a dummy variable, to avoid the ‘dummy variable trap’, Use
List of dummy variable settings for college sports performance grades.
Virtual variable | D1 | 0 | 1 | 1 | 1 | 1 |
D2 | 1 | 0 | 1 | 1 | 1 | |
D3 | 1 | 1 | 0 | 1 | 1 | |
. . . | . . . | . . . | . . . | . . . | . . . | |
D (k−1) | 1 | 1 | 1 | 0 | 1 |
The fitness function is usually used to convert the objective function value to a relative fitness value. To prevent premature convergence, the fitness value can be calculated according to the order of the objective function value in the population. Sort according to the individual objective function value obj from small to large. According to the sequence number of the sort, each level of the individual is given a fitness value. Non-dominated solutions with the same sort are assigned the same fitness value. Equation (5) calculates:
In the formula:
The selection operator is a GA that determines how to select a certain number of good individuals from the parent population based on the set generation gap (GGAP) to inherit into the next generation population. In order to improve global convergence and computational efficiency, the selection method uses random traversal sampling (SUS). SUS is a single-state sampling algorithm with zero deviation and minimum individual expansion. It replaces the single selection pointer used in the roulette method. SUS uses S pointers of equal distance, where
Using uniform mutation, its operation refers to replacing the original gene value at each locus in the individual coding string with a random number that is uniformly distributed within a certain range with a certain small probability, that is, depending on the parent individual the mutation probability Pm is operated to prevent premature convergence from producing a locally optimal solution instead of the overall optimal solution [4].
The specific operation processes of uniform mutation are: 1. Specify each locus in the individual code string as a mutation point in turn; 2. For each mutation point, take a random number from the value range of the corresponding gene with the mutation probability Pm Replace the original value.
Single-point crossover means that only one crossover point is randomly set in the individual code string, and then part of the chromosomes of two paired individuals are exchanged at this point. Here, a crossover position is randomly set for individuals in the group, and the operation is performed according to the crossover probability Pc. The two paired chromosomes exchange part of their genes at the crossover position by a single point crossover, and a new generation of groups is generated through exchange. Figure 3 is a schematic diagram of a single point crossover operation.
The specific implementation process of single-point crossover: 1. Randomly pair individuals in pairs. If the group size is M, there are [M/2] pairs of paired individual groups; 2. For each pair of paired individuals, randomly Set the position after a certain locus as the crossover point. If the length of the chromosome is N, there are N-1 possible crossover point positions; 3. For each pair of individuals, the crossover probability Pc is Part of the chromosomes of two individuals are exchanged at the intersection point, resulting in two new individuals [5].
To evaluate the prediction accuracy and the pros and cons of the model, this study introduces the following errors:
In formulas (6)–(9):
This study uses the actual data of college physical education from 2014 to 2018 as sample data, selects 62 schools (regions) as the research object, and uses the 2018 college physical education performance to test the effect of the prediction model. The software to realize the algorithm is MATLAB software, and the control parameters of GA are set as: initial population number
To compare the influence of the number of college sports performance levels on the multiple regression model, the GA optimised multiple regression nonlinear model is used to calculate all the best fit goodness R2 within the range of the number of college sports performance levels C. The calculation results are shown in Figure 4. The data of Jiangxi province, Henan province, Heilongjiang province and Jiangsu province are the 4 sub-maps in Figure 4 respectively.
It can be seen from Figure 4 that for the prediction of merit number, when the number of college sports performance grades is
According to the above analysis, the number of college sports performance grades of the student (region) merit number prediction model is set to 7; the number of college sports performance grades of the excellent performance prediction model is set to 4, and the sample data is subjected to regression analysis (Table 2).
Summary of the regression results of the share of outstanding and excellent results in college sports from 2014 to 2018
log (POP) | 0.700580484 | 4.046 | 0.415217178 | 3.612307 |
log (PGDP) | 0.00492357 | 3.8 | 0.507549691 | 1.092639 |
Home | 0.720660842 | 1.656 | 0.950142728 | 3.393209 |
Mt-1 | 0.935333311 | 0.9 | 0.733616507 | 4.407553 |
0.819858339 | 0.645 | 0.949132464 | 2.307292 | |
D1 | 0.233461872 | 1.489 | 0.861656496 | 0.10588 |
D2 | 0.522969528 | 4.718 | 0.159542912 | 1.948906 |
D3 | 0.887297088 | 3.593 | 0.108918231 | 3.609763 |
D4 | 0.673300591 | 4.348 | 0.916780756 | 1.823459 |
D5 | 0.681120134 | 1.667 | 0.871739745 | 1.593532 |
D6 | 0.767696616 | 2.757 | 0.6677149 | 1.354243 |
Statistical test |
According to the results in Table 2, the excellent results in 2018 can be predicted (Table 3). Finally, respectively calculate the prediction results of the literature and the prediction ability evaluation indicators of the prediction results proposed in this study (Table 4). It can be seen from Table 4 that the prediction model proposed in this study has obvious advantages in predicting excellent performance; in the prediction of excellent performance, except for the slightly smaller MAE index, other indicators are better than the former.
List of the classification results of the merits of each school (region) and college sports performance.
A | 98.55 | 98 | 1 |
B | 98 | 98 | 1 |
C | 98 | 98 | 1 |
D | 98 | 98 | 1 |
E | 97.25 | 97 | 2 |
F | 97.11 | 97 | 2 |
G | 97.1 | 97 | 2 |
H | 96.12 | 96 | 3 |
I | 92.12 | 92 | 4 |
J | 92 | 92 | 4 |
K | 90 | 90 | 5 |
L | 90 | 90 | 5 |
M | 89.11 | 89 | 6 |
N | 89.12 | 89 | 6 |
O | 87.12 | 87 | 7 |
P | 86 | 86 | 8 |
Summary of the results of the two models’ prediction statistical indicators.
Grades | FCM | 7.123 | 0.5446 | 4.789 | 0.954 |
GA | 6.785 | 0.5214 | 3.256 | 0.957 | |
Excellent results | FCM | 3.456 | 0.4851 | 2.0657 | 0.925 |
GA | 3.278 | 0.4712 | 2.2145 | 0.952 |
GA, genetic algorithm.
From Table 4, it can be found that for the FCM-regression model, because the university sports performance evaluation based on unsupervised fuzzy C-means clustering is difficult to objectively describe, it has limited ability to effectively optimise the combination of student (regional) university sports performance and its predictive ability Naturally, there is no guarantee, making the prediction accuracy relatively low [7].
The GA-regression model proposed by this research can realise the supervised calculation of the student (regional) college sports performance grade through GA, and can dynamically mine the best college sports performance evaluation [8] so that the prediction model based on college sports performance can be optimised. At the same time, the subjectivity of the prediction model is reduced, and the accuracy and stability of the superior and prediction are higher [9].
GA can realise effective supervision and calculation of students’ (regional) college sports performance grades, and can dynamically mine the best college sports performance evaluations so that the prediction model (3) based on college sports performance can be optimised. At the same time, the objectivity of the prediction model is improved, and the accuracy and stability are high in the prediction of the number of excellent (excellent grades). Using GA optimised multiple regression nonlinear model, it is possible to calculate the number of college sports performance grades of college sports students (regions). In the student (region) merit number prediction, the number of college sports performance grades is 7; in the student (region) excellent performance prediction, the college sports performance grade number is 4.