Research on the Optimization of Personalized Learning Paths and Teaching Practice Strategies of Deep Enhanced Learning for Dance Choreographers
Pubblicato online: 26 set 2025
Ricevuto: 19 gen 2025
Accettato: 20 apr 2025
DOI: https://doi.org/10.2478/amns-2025-1041
Parole chiave
© 2025 Liang Ma, published by Sciendo.
This work is licensed under the Creative Commons Attribution 4.0 International License.
With the development of the times and the improvement of people’s living standards, dance has become an important form of culture and art, and more and more people begin to learn and appreciate dance. In various colleges and universities, choreography has also become one of the popular majors [1-4]. Dance choreography is a comprehensive art discipline, which requires students to have solid basic skills and rich knowledge of dance, and to be able to skillfully use choreography skills to create and perform. However, the sustainable development of choreography in colleges and universities and the cultivation of professional talents cannot be separated from personalized professional teaching methods [5-8].
Dance choreography is an art form covering a variety of elements, which not only requires students to carry out skillful training and performance, but also needs to deeply explore and cultivate individual characteristics. Therefore, the teaching of choreography requires not only the grasp of commonality, but also the attention and cultivation of individuality [9-12]. With the development and change of education, more and more educators and students are inclined to personalized learning. Personalized curriculum design and implementation has become a very important educational task. For a course such as choreography, individualized design and implementation are more challenging [13-16].
Educators need to understand students’ backgrounds, interests, learning styles, and physical conditions. Based on students’ needs, appropriate choreography programs and activities are designed to meet students’ learning and developmental needs. In addition, when designing personalized dance choreography courses, educators need to emphasize the importance of students’ comprehensive literacy [17-20]. Choreography requires a great deal of physical training, cultural awareness, and aesthetic ability, and educators need to design challenging training and practice of choreography works according to students’ physical conditions and abilities [21-23] in order to help students improve their choreography level and quality. And deep reinforcement learning plays an optimizing role for personalized learning in order to promote the development of teaching practice [24-25].
Deep reinforcement learning combines the advantages of deep learning and reinforcement learning to provide an efficient decision optimization method. In this paper, we first review the basic principles of reinforcement learning and deep learning, and design an adaptive learning path recommendation model for dance choreographers based on the reinforcement learning algorithm that combines value and strategy. Learning goal features and domain knowledge features are added to the model, and LSTM and Transformer are utilized to predict the cognitive state and knowledge point coverage of the learner, respectively, while the change of the difficulty of the learning content is taken into account. Mathematical modeling of the learner’s state, action and reward values was performed using the Actor-Critic algorithm, and the D3ON algorithm was used to implement the choreography content recommendation function. In addition, the effect of learning path optimization was tested through experiments, and the practical effect of the new teaching strategy was verified through t-test.
Reinforcement learning discusses the problem of how an intelligent body can maximize the rewards it can obtain in a complex, uncertain environment. Reinforcement learning consists of two parts: the intelligence and the environment. After an intelligent body acquires a state
A strategy is a model by which an intelligent body chooses its next action. The intelligent body decides its subsequent actions according to a certain strategy. Strategies can be categorized into two types: stochastic and deterministic strategies.
The stochastic strategy is commonly represented by
The deterministic strategy is as in equation (2):
The value function is used to evaluate the goodness of the current state. The goodness of the state lies in the influence of the current state on the high or low rewards brought by the subsequent actions. The larger the value of the value function, the more considerable the future reward is, and the more favorable the current state of the intelligence is to the future reward. For all
Where
The next state an intelligent body is in is determined by the combination of its current state and the action it takes at this moment, a process that involves two key elements, the state transfer probability and the reward function. Define the transfer probability of taking action
The reward function, on the other hand, defines the extent to which the system can be rewarded for performing an action in a particular state:
Deep reinforcement learning integrates the parsing power of deep learning for complex data and the intelligent decision-making skill of reinforcement learning, which is capable of generating optimal decision responses directly based on multidimensional input information, constructing a seamless end-to-end decision control system. In this system, the intelligent body inputs various information into the network generated by interacting with the environment as a means of accumulating experience and driving iterative updating of the decision network parameters, with a view to learning the most optimal decision strategy [27].
The deep DQN algorithm is a classic value-based deep reinforcement learning algorithm. DQN combines convolutional neural network with Q-learning algorithm in traditional reinforcement learning, and uses an empirical playback mechanism to store the transfer sample (
The DQN model uses a deep convolutional neural network to approximate the optimal action-value function (7), where
DQN In addition to using a deep convolutional network to approximate the current value function, the DQN model uses another network
At network initialization
The gradient formula is obtained by using SGD to obtain the partial derivatives for parameter
A significant drawback of the DQN algorithm is the
Policy-based methods are suitable for continuous or high-dimensional action spaces and have the advantages of simple policy parameterization and fast convergence.
The REINFORCE algorithm is a typical policy-based deep reinforcement learning algorithm. It is based on the idea of gradient ascent to maximize long-term returns by directly updating the parameters of the policy function. The core idea of the REINFORCE algorithm is to use the policy function to define the probability distribution of choosing an action in a given state, and then compute the gradient based on the trajectories obtained from sampling, and ultimately use the gradient ascent method to update the parameters of the policy function.
The advantage of the REINFORCE algorithm is that it is able to self-adjointly optimize the policy function without the need to estimate the value function, making it suitable for problems in both discrete and continuous action spaces. However, the REINFORCE algorithm also has some disadvantages, such as low sampling efficiency and high variance [29].
The Actor-Critic algorithm is a reinforcement learning algorithm that combines a policy gradient approach and a value function approach. It consists of two parts:
Actor is responsible for tuning the parameters Parameterized vectors
Actor network can be described as a network that finds the probability of all available actions and selects the one with the highest output value, while Critic network can be described as a network that evaluates the selected action by estimating the value of the new state resulting from the execution of the action [30].
The Deterministic Policy Gradient (DPG) algorithm is a common Actor Critic algorithm, and the DPG models policies as deterministic policies
Compared to the stochastic strategy, the deterministic strategy gradient removes the integral over the action, the gradient only integrates over the state, so there is no need to sample the importance of the action, then the gradient becomes Eq. (12), which greatly improves the efficiency:
In this study, the ALPRM graph shown in Fig. 1 is constructed based on the deep reinforcement learning framework, and the model consists of two layers: dynamic learning environment characterization and adaptive learning path recommendation.
In the dynamic learning environment characterization layer, the core dynamic features in the learner personality traits and domain knowledge features are extracted to characterize the dynamic learning environment. In the adaptive learning path recommendation layer, the main components of the MDP are redefined, with “state” defined as the representation model of the dynamic learning environment, “action space” as the candidate learning object, and “return value” as the relevant learning object. Return value” is defined as a function of the difficulty feature. The dynamic environment feature variables are used to train the policy network for deep reinforcement learning, and finally the trained model is used to recommend the learning object that best fits the current learning state of the learner.

ALPRM diagram integrating domain knowledge features
In this study, learning target features and domain knowledge features are added to characterize the dynamic learning environment as
In this study, the LSTM model was used to predict the cognitive state of the learner, the Transformer model was used to predict the conceptual coverage of the next knowledge point, and the dynamic difficulty value of the learning object was calculated based on the cognitive state of the learner.
Suppose there is a C course containing a total of
The ISTM model is used to predict learners’ mastery of knowledge concepts and to track their cognitive state. The input to the LSTM model is
where · denotes dot product and
When the training of the LSTM model is finished, the historical answer records of a learner are inputted and the output of the model is his mastery of all the knowledge point concepts of the course, denoted as
The Transformer model is used to predict the knowledge point concept coverage to pinpoint the knowledge point concepts that the learner should learn next. The positional coding is embedded in the input of the Transformer model to characterize the sequential information in the historical learning record, and the input of the model is denoted as:
where
The Transformer model utilizes a decoder to predict the conceptual coverage of knowledge points for the next question. The decoder is connected to the encoder through a self-attentive mechanism and finally the output of the model is obtained through the full neural network
When the training of the Transformer model is finished and a learner’s record of exercises is input, the output of the model is the probability of occurrence of all the Knowledge Point concepts in the course, denoted as
where
Difficulty is a core factor to be considered in the process of recommending study materials for choreography, and this study utilizes Equation (18) and Equation (19) to calculate the difficulty of the exercises:
The Actor-Critic components designed in this paper include: states, actions, and return values.
State: this study considers the dynamic learning environment as the state of Actor-Critic, characterized as
Action: the strategy network is a pre-trained neural network model, which accepts the learning environment state State, samples from the action space
where
Reward value: this study refines the method of calculating the reward value by giving a certain reward at each step of the intelligent body’s exploration and at the end of the exploration, and the reward value function is designed as follows:
The D3ON algorithm is used to realize the choreography study material recommendation function. Two
Where,
After the algorithm is run for many iterations, the policy network is trained. When all the variables in the dynamic learning environment model constructed above are input into the neural network, the corresponding exercises can be output.
Considering the characteristics of personalized learning path recommendation and the current research status, this paper adopts the “Dance Choreography” learning platform constructed with JSP+MySQL technology as the experimental object, and analyzes the experimental effect of the constructed personalized learning path recommendation model.
The “Dance Choreography” e-learning platform consists of four modules, namely, learning navigation, learning resources, problem solving and exploration, and learning interaction module. The knowledge items in the modules are categorized according to the chapters of the knowledge points. Among them, the resource navigation module consists of learning objectives, knowledge tree, key points and difficulties. The learning resources module consists of videos, e-lessons, and textbooks. The Problem Solving and Inquiry module consists of example problem analysis, exercises, and quizzes. The learning interaction module consists of discussion forums.
In order to improve the system’s extraction and programming of learning user access paths, it is necessary to further optimize the processing of the original log data, firstly, the knowledge items under the learning module are redefined, as shown in Table 1.
Knowledge item mapping
Learning module | Knowledge item | Mapping |
---|---|---|
Topic selection | Overall design | K1 |
Material selection and design | K2 | |
Movement design | Structure design | K3 |
Movement arrangement | K4 | |
Stage presentation | Music selection | K5 |
Creation and performance | K6 | |
Stage composition | K7 | |
Movement foundation | Basic techniques | K8 |
Basic dance steps | K9 | |
Common dance poses | K10 |
In this paper, a web page data collector was used to capture the learning data of 80 users in the log of the web platform. Considering the limitation of the learning materials of dance choreography on the scope of learning modules selected by users, the experiment chooses “Practice of Dance Choreography”, which has a comprehensive distribution of learning modules, as the experimental collection area. The number of visits to learning user nodes, learning paths, and test scores are obtained. Among them, the learning user node access volume refers to the click volume and time duration of the learning user for each knowledge item node, as shown in Table 2.
Learner’s node traffic
Knowledge item | Node traffic | |
---|---|---|
Clicks | Duration(seconds) | |
K1 | 103 | 41856 |
K2 | 97 | 28965 |
K3 | 106 | 39604 |
K4 | 471 | 299521 |
K5 | 434 | 253799 |
K6 | 240 | 217802 |
K7 | 366 | 342194 |
K8 | 584 | 344995 |
K9 | 317 | 208394 |
K10 | 281 | 120122 |
Based on the similar learning user model building method, 8 groups of similar user clusters were established for 80 learning users, and the recommended TopN-1 was calculated. The parameters were initialized and calculated according to the ant colony algorithm, which mainly included the user and learning style similarity value
The learning path and personalized recommendation of similar user groups
Learning level | Similar user groups | Similar learning path | TOPN-2 |
---|---|---|---|
90-100 | A1 | K1,K3,K2,K4,K5,K6,K7,K8 | K9(70%) |
A2 | K1,K2,K3,K5,K6,K4,K7,K8 | K10(13%) | |
80-90 | B1 | K2,K4,K7,K6,K9,K10,K8 | K3(71%) |
B2 | K2,K3,K4,K5,K7,K9 | K6(80%) K8(45%) | |
70-80 | C1 | K5,K4,K6,K8,K9 | K3(60%) |
C2 | K4,K6,K7,K10,K8 | K2(50%) K3(65%) | |
60-70 | D1 | K5,K4,K10,K7 | K2(55%) K3(35%) K9(60%) |
D2 | K6,K9,K10 | K3(51%) K4(63%) K8(72%) K9(43%) |
Starting from the target demand of personalized learning path recommendation, we introduce 2 performance indicators, learning efficiency and learning maze guide control effectiveness. Learning efficiency indicates the rate of improvement in learning performance after a period of continuous use of personalized learning path recommendation. Learning effectiveness is measured by the rate of increase of knowledge item check-in after learning users use the personalized learning path recommendation program compared with the previous one. The higher the concentration of knowledge item check-in, the higher the degree of the solution to the problem of learning lost in the process of students’ online learning.
To this end, a definition of knowledge quantity is first introduced: an e-learning platform is a network system composed of multiple knowledge nodes, each of which consists of
Five dance choreography learning users are randomly selected from each of the eight groups of similar user groups for personalized learning path recommendation of dance choreography, and the lost-guide-control rate of personalized learning path recommendation is obtained respectively, as shown in Table 4.
The misguidance control rate for personalized learning paths
Learning level | Similar user groups | Average misguidance control rate |
---|---|---|
90-100 | A1 | 2.3 |
A2 | 1.1 | |
80-90 | B1 | 6.0 |
B2 | 6.9 | |
70-80 | C1 | 11.5 |
C2 | 11.7 | |
60-70 | D1 | 15.4 |
D2 | 16.7 |
Comparing the density of choreography knowledge item check-ins before and after personalized learning path recommendations, Figures 2 and 3 demonstrate the density of knowledge item check-ins before and after personalized learning path recommendations, respectively. Figure 4 shows the achievement trend after personalized learning path recommendation.

The knowledge item check-in density before recommendation

The knowledge item check-in density after recommendation

The development trend of dance performance after recommendation
From the data, it can be seen that through the personalized learning path recommendation, the learning of choreography course learners have a certain degree of guidance and control, which is manifested in the acceptance of choreography recommended learning content recommendation, choreography learners check-in density is significantly higher than the recommended before the learning user check-in density. After receiving the path recommendation, the dance choreography performance of the dance choreography learning users are improved, especially for the dance choreography level between 60-70 points and 70-80 points of the learning users, their dance choreography level improved significantly.
In order to test whether the teaching strategy of using reinforcement learning for personalized recommendation of learning paths in dance choreography is practically effective, two classes of dance majors in a university were randomly selected as an experimental class (N=43) and a control class (N=45). The new teaching strategy was applied in the experimental class, while the control class adopted the original teaching strategy. Before the experiment began, the two classes were pretested and compared in terms of their performance levels in dance choreography, and it was found that there was no significant difference between the pretest scores of the two classes (p>0.05), so it was considered that the two classes were homogeneous and fulfilled the requirements of the experiment.
The post-test data is the end-of-semester exam results at the end of the optimized and recommended teaching experiment of dance choreography learning path. The students chose the dance choreography test questions by drawing lots and completed the work choreography on the spot, and each student’s dance choreography exercise was scored and judged separately by the dance instructor of the preschool education major. The obtained post-test scores were obtained as shown in Table 5, and it was found that the average score of the experimental class was 0.54 points higher than that of the control class, and further T-test analysis was carried out to test whether the difference was statistically significant.
The performance of the students’ dance choreography skills
Items | Class A | Class B |
---|---|---|
Overall design | 7.27 | 6.93 |
Material selection and design | 6.88 | 6.45 |
Music selection | 6.85 | 6.40 |
Structure design | 6.60 | 5.79 |
Movement arrangement | 6.48 | 5.88 |
Stage composition | 6.82 | 6.38 |
Basic dance steps | 6.64 | 6.19 |
Common dance poses | 6.82 | 6.10 |
Basic techniques | 6.79 | 6.26 |
Creation and performance | 6.52 | 5.88 |
Mean | 6.77 | 6.23 |
The data of the pre and post-test scores of the experimental class of dance choreography were entered into SPSS19.0 software for paired samples t-test respectively, and the results obtained are shown in Table 6. The obtained Sig value is 0.000, which is smaller than the critical value of 0.05, so the original hypothesis that there is no significant difference between the overall means represented by the two samples can be rejected. In other words, the paired-sample t-test results of the achievement data measured before and after the implementation of the personalized learning path recommendation-oriented dance choreography teaching experiment for the students in the experimental class show a significant difference, with the post-test data showing a more pronounced increase than the pre-test data.
The result of paired sample t test
Pre-test data (N=43) | Post-test data (N=43) | t | P(sig.) | |
---|---|---|---|---|
Mean | 6.23 | 6.77 | 13.871 | 0.000* |
From this, it can be judged that the experimental class students’ performance in the dance choreography course improved more significantly at the end of the experiment applying the new teaching strategy than before the experiment began.
The t-test results for each sub-item of the experimental class are shown in Table 7. The p-values obtained from the paired samples t-tests of the pre- and post-test scores of each sub-item of the students in the experimental class A were all less than the critical value of 0.05.
T test results of each item of the experimental class
Items | Pre-test | Post-test | P(sig.) |
---|---|---|---|
Overall design | 6.95 | 7.27 | 0.007 |
Material selection and design | 6.23 | 6.88 | 0.000 |
Music selection | 6.51 | 6.85 | 0.013 |
Structure design | 5.91 | 6.60 | 0.000 |
Movement arrangement | 5.97 | 6.48 | 0.000 |
Stage composition | 6.41 | 6.82 | 0.001 |
Basic dance steps | 6.38 | 6.64 | 0.000 |
Common dance poses | 6.02 | 6.82 | 0.000 |
Basic techniques | 5.92 | 6.79 | 0.000 |
Creation and performance | 6.01 | 6.52 | 0.000 |
After the implementation of the personalized learning path recommendation-oriented choreography teaching experiment, the students’ choreography scores (post-test data) in the experimental class, including the three weaknesses of “structural design”, “movement choreography”, and “basic skills”, have increased significantly compared with the choreography scores before the experiment (pre-test data), by 11.68% and 8.54% respectively. After the implementation of the experimental dance choreography teaching experiment, the choreography performance of the students in the experimental class, including the three weak areas of “structural design”, “movement arrangement” and “basic skills” (the post-test data), has increased significantly compared with that of the choreography performance before the experiment (the pre-test data), with an increase of 11.68%, 8.54% and 16.69%, respectively. This also means that the students in the experimental class had their weak links in dance choreography effectively improved in this learning path recommended teaching experiment, and their dance choreography scores improved more significantly compared to those before the experiment started.
The data of the pre-test and post-test scores of the control class B students were entered into the SPSS19.0 software for paired samples t-test respectively. The results of the paired samples t-test for the control class students on the pre-test and post-test of the experiment are shown in Table 8.
The result of paired sample t test
Items | Pre-test | Post-test | P(sig.) |
---|---|---|---|
Overall design | 6.81 | 7.09 | 0.264 |
Material selection and design | 6.30 | 6.31 | 0.452 |
Music selection | 6.75 | 6.33 | 0.276 |
Structure design | 5.50 | 5.63 | 0.166 |
Movement arrangement | 5.91 | 5.68 | 0.395 |
Stage composition | 6.42 | 6.33 | 0.381 |
Basic dance steps | 6.24 | 6.60 | 0.213 |
Common dance poses | 5.76 | 6.21 | 0.460 |
Basic techniques | 6.73 | 6.50 | 0.509 |
Creation and performance | 5.90 | 6.02 | 0.275 |
Mean | 6.232 | 6.27 | 0.541 |
The Sig value of the t-test for the overall mean score is 0.541>0.05, so the original hypothesis that there is no significant difference in the overall level of dance choreography before and after the experiment in the control class is not rejected, i.e., the students in the control class before and after accepting the traditional mode of teaching do not show statistical significance despite the fact that there is a slight increase in the performance of dance choreography. In the t-tests for each subdimension of choreography achievement, the t-statistics for all subdimensions were greater than 0.05, indicating that the differences in pre- and post-test scores for each subdimension were not significant.
It is inferred that the students’ dance choreography level did not achieve significant improvement after the control class was taught using traditional teaching strategies, and its teaching effect was not as effective as the recommended teaching strategy of personalized learning path optimization proposed in this paper.
In this study, a learning path personalized recommendation model is designed mainly by deep reinforcement learning algorithm, which can dynamically provide students with appropriate choreography learning content recommendations according to the learning environment. The results show that the density of the dance choreography knowledge item check-in increases significantly after the recommendation using the method of this paper, and the learning users’ dance choreography scores all show an upward trend in the process of learning by accepting the method of this paper. In the comparative practice of teaching strategy practice, the post-test performance of the experimental class using the new teaching strategy is 0.54 points higher than that of the control class (p<0.05), and all the sub-projects of dance choreography have significant improvement. Among them, the optimization effects of “structural design”, “movement choreography” and “basic skills” were the most obvious, with improvement rates of 11.68%, 8.54% and 16.69% respectively. 16.69% respectively. On the other hand, the choreography level of the control class did not improve significantly. Accordingly, this paper concludes that the newly proposed deep reinforcement learning algorithm can effectively optimize the personalized learning path of dance choreography, and has reliable teaching practice effects.