Open Access

Optimization and Recommendation System Design of Digital Resources for Civic and Political Education for College Students

  
Sep 25, 2025

Cite
Download Cover

Introduction

To do a good job in the Civic and Political Theory Course, the most fundamental thing is to fully implement the Party’s education policy, and solve the fundamental problem of what kind of people to cultivate, how to cultivate people, and for whom to cultivate people. Civic and political theory class is an important platform for conducting civic and political education, an important channel for publicizing socialist core values, and an important position for strengthening the learning of socialist thought with Chinese characteristics in the new era [1-2]. The level and effect of the Civic and Political Theory Class affects the effectiveness of Civic and Political Education in colleges and universities. Enhancing the teaching effectiveness of the Civic and Political Theory Class and promoting the innovative reform of the Civic and Political Theory Class in colleges and universities depends on the medium of information technology, the construction of an integrated team of teachers in teaching and research, and the promotion of the optimization of the digital resources of the Civic and Political Education as a way to enhance the ideological, theoretical and practical effectiveness of the Civic and Political Theory Class in colleges and universities [3-6]. On the basis of the theoretical research on the efficient integration of information technology and education and teaching of civic and political education, summarize and explore the practical experience of teaching civic and political education in colleges and universities, and provide a new development path for improving the effectiveness of education and teaching of civic and political education [7-9]. At present, most of the college civic education using large classes, open classes for collective teaching, there is a single form, poor relevance, lack of synergistic effect, unable to form a personalized collaborative nurturing mechanism and other issues, based on the above shortcomings, combined with the new situation of the college civic education digital resources optimization needs, you can design a college civic education course recommendation system [10-13]. The system follows the generalized design idea of software engineering, and the development and application of the system provides a new way for the development of college civic education, which is conducive to improving the comprehensive quality of college students in the civic education course, and has positive significance in creating a collaborative win-win situation for teachers and students, diversified forms, and personalized innovation in the atmosphere of civic education in colleges and universities [14-17].

This paper proposes an improvement to the similarity calculation method of collaborative filtering algorithm by adding a popular resource penalty factor and a time decay penalty factor to the similarity calculation. At the same time, the content-based recommendation algorithm is utilized to extract features from the labels of the course Civics resources. The similarity of the content-based and user-based recommendation algorithms are fused with certain weights as the similarity result of the hybrid recommendation algorithm to predict the recommendation effect of the course Civics and Political Science resources. Finally, the accuracy and adaptability of the hybrid recommendation algorithm are analyzed.

Hybrid Recommendation Algorithm for Civic and Political Education Digital Resources
Acquisition and pre-processing of experimental datasets
Data set acquisition

In this paper, we choose the behavioral dataset of college students of an online course of an educational platform, which can be used as a data source for the big data direction competition on the one hand, and on the other hand, it can be used by researchers to explore the ideological and political education industry. In this paper, this online platform’s student users choose the Civic and Political Education course learning dataset to carry out research. In order to accurately recommend the Civic and Political Education course resources to the users, it is necessary to understand the content of each dataset table, which in turn facilitates the pre-processing of the data.

Data pre-processing

The dataset that has not been preprocessed has a large number of feature columns that cannot be used in the later training, as well as containing some missing values of scoring, which is not convenient for the later training of the improved algorithmic model, so the original dataset needs to be cleaned, and if part of the data format in the dataset can not be used by the algorithmic model, then it is necessary to convert the data format at this time. After the above steps, the data content may be relatively discrete, without a certain logic, you can merge some of the feature columns in the data set through data integration to get the final experimental data set.

Data Cleaning

Usually, the first step of most dataset preprocessing is data cleaning, in which there are several cases: the existence of missing values, invalid feature columns, duplicate data, etc., so that the algorithmic model can use the dataset more efficiently, and the dataset can be operated as follows:

Deletion of useless feature columns. In some datasets, there is a feature column with null or duplicate values, and this type of data cannot be used as valid information. Therefore, this paper deletes the feature columns of registration time, user password, user name, login time, and number of concerned courses in the user’s personal information dataset; course entry time and instructor’s name in the course resource information dataset; and whether or not to evaluate the course in the user’s course selection dataset, so as to facilitate the subsequent further operations on the dataset.

Filling of missing values. There is a large amount of missing rating data in the user course evaluation information dataset, and the missing value processing methods often used include: filling in the missing content, deleting certain feature columns with missing values, and directly using the dataset with missing values for training, and the latter two processing methods will cause information loss, which will affect the final experimental results. Therefore, in this paper, we mainly use to complement the missing content of the data.

Data transformation

Data normalization, discretization and feature coding are some common operations of data conversion, aiming at processing data into a format structure that can be understood by the algorithmic model. In this paper, we mainly focus on the features listed as the length of the course and the length of the study course to make a unified specification.

Data Integration

This paper focuses on the study of user course rating behavior data, and this data set is composed of four scattered data sets, so it is necessary to integrate the processing of these four data sets, the correspondence between the four data sets is shown in Figure 1.

Figure 1.

Schematic diagram of data integration

Selection of hybrid recommendation algorithms

In order to improve the real-time, accuracy and applicability of the recommendation model of Civic Education, this paper proposes a hybrid recommendation algorithm based on collaborative filtering and resource content. The flow of this hybrid recommendation algorithm is shown in Figure 2. Since the college students of the Civic and Political Education Learning System are relatively fixed and at the same level, and there are a large number of resources with common ratings, Pearson’s similarity is used as the similarity calculation method of the recommendation model. However, this method cannot solve the cold-start problem caused by the sparse data of college students and resources, so the collaborative filtering mechanism of the penalty factor is introduced to improve the Pearson correlation coefficient, which makes up for the shortcomings in data sparsity. Regarding the cold-start problem of the system, relevant labels can be given to the Civics resources when they are uploaded. A content-based recommendation algorithm is used to extract the features of these labels, analyze the similarity between the target resource and other resources, predict the ratings of the new resource, and use the predicted ratings to populate the original college student-Civics resource matrix. The cold-start problem [18-19] can be solved effectively.

Figure 2.

Hybrid recommendation algorithm▢

Improvement of Similarity Calculation
Similarity Improvement for Collaborative Filtering Algorithms

In order to improve the recommendation accuracy of the algorithm, this paper introduces two penalty factors on the original Pearson similarity calculation [20-21], which are the popular resource penalty factor and the time decay penalty factor, and the steps of similarity calculation improvement are as follows:

The first step is to construct the user-resource scoring matrix. The construction of the matrix is completed according to the number of users and resources in the dataset and the rating information of users on resources.

In the second step, the Pearson correlation coefficient is used to calculate the similarity between the Civic resources, as shown in equation (1). In the formula, Umn denotes the set of users who have rated both resource m and resource n, τum and run denote the ratings of resource m and resource n by user u, and rm¯ and rn¯ denote the mean value of the scores of resource m and resource n, respectively, in a certain time period. sim(m,n)=ΣuUmn(rumrm¯)(runrn¯)uUmn(rumrm¯)2×uUmn(runrn¯)2

In the third step, the weights of the popular resources penalty factors are added. The course Civics learning system in this paper aims to personalize the recommendation of high-quality Civics resources that students are interested in, so it is necessary to reduce the influence of popular resources on the similarity, and appropriately increase the weight of the ratings of nonpopular resources to achieve accurate recommendation. The introduced weights are shown in Equation (2), where Iuv represents the set of resources that have been rated by user u and user v together, rui and rvi are the ratings of user u, v for resource i, ru and rv are the ratings of user u, v for resource i, and ru and rv represent the sum of ratings of user u, v’s own rated resources. This weighting improves the similarity calculation method by calculating the ratio of the sum of the scores of the resources jointly rated by users to the sum of their respective ratings as a penalty factor, which eliminates the disadvantage of popular resources affecting the accuracy of recommendation. θ=Σiϵluprui×rviΣiϵluru×ilvrv

The fourth step, adding a time decay penalty factor, is mainly for the user to browse the resources with the change of time, to the Civics course, for example, the teaching progress with the advancement of time, students learn the content and technical points are also in-depth, and each chapter of the course Civics elements and specific knowledge points are closely related. So a penalty factor should be added to these resources with too long a grading interval. The time decay weights are shown in equation (3), where tum and tun denote the rating time of user u for resources m and n. It can be seen that the weight function takes values in the range of 0-1, and shows a nonlinear increasing trend with the increment of the rating time difference, highlighting the degree of influence of the time decay of the user’s preference, and improving the inadequacy of the traditional similarity calculation that relies only on the user’s rating, so that it can more objectively and accurately recommend the desired resources for the user. f(t)=etumtun1+etumtun

Combining the above two penalty factors on the Pearson correlation coefficient not only reduces the scoring ratio of active resources, but also reflects the transfer of user preferences at different times. The similarity calculation formula after combining the two penalty factors is shown in equation (4). sim1(m,n)=θ×f(t)×uUmn(rumrm¯)(runrn¯)uUmn(rumrm¯)2×uUmn(runrn¯)2

Similarity Improvement Based on Recommendations after Content Optimization

In order to solve the cold-start problem that exists when college students enter the system and produce accurate recommendation results, it is also necessary to optimize and quantify the content information of the resources. Label is used to describe the content and form of resources in a generalized way, giving labels to resources is conducive to users to quickly filter and browse the resources they want, and also facilitates the system’s mining and analysis of resource data.

Firstly, the weight value of the label on the resource is defined and the resource-label matrix is constructed as shown in equation (5). Where a denotes the number of resources, b denotes the number of resource-labeled contents, wab denotes the bth label attribute of the ath resource, and wab is 1 if the resource a has that label attribute, otherwise wab is 0. R=[ w11 w12 w1b w21 w22 w2b wa1 wa2 wab]

After constructing the resource-label matrix, the weight value of the label to the user is calculated as shown in equation (6). Where r(tm) denotes the sum of user’s ratings for resources with tm labels, r(ui) denotes the sum of user’s ratings for all resources, a denotes the total number of resources, and b denotes the number of resources with tm labels. w=r(tm)r(ui)×ba

Then the user’s interest features and the label attributes of the resources are vectorized using the method of spatial vector representation, where the user’s interest features can be denoted as Um = {(t1, w1), (t2, w2), …(tm, wm)}, the feature vector of the resources can be denoted as Vn = {(t1, w1), (t2, w2), …(tm, wn)}, t denotes the label attribute in the resources, and w denotes the weight occupied by the label attribute. Compare the content attributes of the resources to calculate the similarity between the resources, as shown in equation (7). sim2(m,n)=Um×Vn|Um|×|Vn|

Predicting the Effectiveness of Hybrid Recommendation Algorithms

Combining the improved Pearson similarity sim1(m, n) using the penalty factor and the similarity sim2(m, n) calculated based on the content attributes to get the resource similarity of the hybrid recommendation algorithm, as shown in equation (8). Where α represents the weight accounted for by the similarity calculation method, the value range is 0-1, according to the differences in the resources can take different weights, the similarity obtained is also different. sim(m,n)=α×sim1(m,n)+(1α)×sim2(m,n)

According to the improved hybrid algorithm similarity calculation to find the set of nearest neighbors of the target resource, the predicted score of user u on resource m is shown in equation (9). Where pum represents the predicted score of user u on resource m, S(m, K) represents the set of similar neighbors of resource m, where the first K resources are taken as the number of resources in the list to be recommended, rm¯ represents the mean value of the score of resource m, run represents the score of user u on resource n, and rn¯ represents the mean value of the score of resource n. pum=rm¯+nS(m,K)sim(m,n)×(runrn¯)nS(m,K)|sim(m,n)|

Finally, for the target user, traverse all the itemsets, select the set of resources for which no behavioral records have been recorded yet, and generate the to-be-recommended list by sorting the resources in descending order of their predicted scores. Finally, the top N resources in the to-be-recommended list are selected as the Top-N recommended list.

Recommendation algorithm evaluation metrics

Depending on the recommendation task, Mean Absolute Error MAE and Root Mean Square Error RMSE are commonly used to evaluate the accuracy of predicting the user’s ratings of items, while Accuracy Precision and Recall Recall are used to evaluate the performance of recommendation lists generated by recommendation systems.

Accuracy Precision is used to indicate how many of the recommended lists are items that the user actually likes, and is defined as shown in Equation (10). precision=1ni=1n|LiRi||Li|

where Li is the recommendation list generated by the recommendation algorithm for user i, and R represents all the items liked by user i in the test set.

The Recall metric is used in recommender systems to measure the proportion of the number of user’s favorite items included in the recommendation list, which is defined as shown in Equation (11), i.e., the number of user’s favorite items in the recommendation list is divided by the number of all items that the user actually likes. Therefore, a higher Recall value indicates that the recommender system can better cover the user’s preferences and provide more recommended items that match the user’s interests. recall=1ni=1n|LiRi||Ri|

Results and discussion
Data Preparation of Recommendation Algorithm for Civic Education Digital Resources
Classification and annotation of learning resource datasets

The online resource types of the learning resource recommendation system commonly include test questions, documents, PPT, video, audio, source code, pictures, graphics, links to experimental platforms and other types. According to the resource type, “sourceType” is set in the database for labeling. The results of the categorization of the information in the sourceType field of the digital resources for Civic and Political Education are shown in Table 1 below. Because it involves the running system, the table name is not provided, and only some examples of field information are provided. Each type of resource in the table will be equipped with detailed information labeling fields available, such as name, number, original file name, storage file name, etc.. This information annotation, categorization and automatic marking are used for better management and utilization of learning resources.

Learning resource classification

Data table English name Database source type ID
Documentation SourceType=6
Video SourceType=1
Audio frequency SourceType=3
Source code SourceType=5
Picture SourceType=2
Text SourceType=4

In the process of video labeling, in addition to the common test attributes such as the logo of the video, question number, title, difficulty, etc., the difficulty and similarity attribute fields are also labeled in this paper for better management and utilization of learning resources. The collaborative filtering hybrid recommendation algorithm for Civic and Political Education digital resources designed in this paper is based on the task of completing intelligent video recommendation according to learners’ video learning. On this basis, this paper further designs and implements the requirement analysis and functional design related to learning resource recommendation in the intelligent assessment system.

Classification of students’ learning styles

This paper draws on the ideas of student stratification and resource partitioning, and uses statistical methods to categorize the learning styles and growth patterns of students in a class based on the historical click-through rates of Civic Education videos in a university.

The students were categorized into “Class A students-active learners, Class B students-potential learners, and Class C students-inactive learners” according to the active degree of clicking on the videos. The data curves of the student categories are shown in Figure 3. We can determine the learner identity labels of the three groups and summarize their group characteristics:

Figure 3.

Historical data curve

Compared with the other two groups, learners in Group A tend to watch more educational video resources, and their learning ability values are always at a high level; we define learners in this group as active learners. In general, they are not only highly engaged in learning, but also more capable of learning. Compared with Group A, learners in Group B watched significantly fewer educational video resources. However, the proficiency of learners in Group B is still high and almost equal to Group A. We refer to the learners in this group as potential learners. Although they have only studied a small number of educational video resources, they are highly competent and have a great potential to be tapped. Learners in Group C participate in the least number of interactive educational video resources and have a relatively low competency value. Compared with the other two groups, they are not active in the whole learning process and their learning effect is not good, we define this group of learners as inactive learners. They seldom participate in the learning of educational video resources, and their own ability is weak, so it is difficult to generalize a regular learning pattern from their learning behavior.

The frequency of video clicks of the three types of students is greater for category A than for category B than for category C. Their average accuracy rates are 75.89, 66.11 and 58.17 times respectively. For category A students, the main focus is to spur and stimulate interest, and to reduce the recommended degree of less difficult test questions. To satisfy the challenging nature of their knowledge and ability growth and to stimulate learning motivation. For the B students with moderate foundation, according to the “three-zone theory” proposed by Prof. Nordic, we recommend the test questions with moderate difficulty for them to ensure the consistency of their learning. For C students, encouragement is the main focus, and the degree of recommendation of difficult questions is reduced to protect their learning motivation.

Performance Analysis of Recommendation Algorithm for Civic Education

In this paper, 2597 logs with almost 0 engagement were removed from 25000 available learning behavior logs. Finally, the number of active learners in this experiment is 277, the number of potential learners is 534, and the number of inactive learners is 358, and the corresponding number of behavioral logs of the three groups of learners is 13458, 7848, and 1097, respectively. The performance analysis of the experiment is divided into three main elements:(1) analyzing the basic performance of the three groups of learners in precision metrics, and investigating the relationship between the part of metrics with the recommendation list length; (2) compare the performance of the three groups of learners on non-precision indicators, highlighting the effectiveness of the personalized exploration strategy; (3) take active learners as the target group, and compare the performance differences between the collaborative filtering hybrid recommendation algorithm proposed in this paper and the existing recommendation algorithms in various aspects.

Video similarity results under hybrid recommendation algorithm

The results of video similarity at different recommendation list lengths (N) are shown in Figure 4 below. At N=10, the video similarity coefficient at the beginning is lower than N=20, but as the video click rate is around 1100, its similarity then exceeds that at N=10, up to more than 0.9. At N=5, the video similarity results recommended by the hybrid algorithm are around 0.4, which is always lower than that at N=10 and N=20. It is obvious that the learners are highly engaged and capable in the learning process, active learners perform better in the learning process, and the improved similarity prediction results of the hybrid recommendation algorithm with closer to the real results. It also proves that the longer the recommendation list is, the more accurate the recommendation results of the hybrid algorithm for video similarity.

Figure 4.

Different recommendation list length of video similarity results

Effect of Recommendation List Intensity on Students’ Learning Efficiency

In this paper, the cumulative concentration rate of students on videos is investigated under different recommendation lists. Specifically in the recommendation algorithm experiments, a student is selected to combine his own historical learning data through the historical data curve of his category of Civic Education. A hybrid recommendation algorithm was used to recommend exercises for him. Pearson similarity is used in the recommendation process to predict the correct response rate of the student facing the next round of tests and optimize the Top-N recommendation list, and the response experiment is conducted after completing the recommendation.

The results of resource recommendation for three different groups of learners are shown in Figure 5 below. It can be seen that the accuracy of the resource recommendation results increases as the strength of the recommendation list (N=5, N=10, N=20) increases. The cumulative video resource hit rate is very high for both active learners and potential learners. The potential learners had the highest cumulative hit rates for N=5, N=10, and N=20 (96.28%, 98.65%, and 99.03%), followed by active learners (95.44%, 96.97%, and 98.53%), whereas the inactive learners had very low cumulative hit rates (39.62%, 43.41%, and 47.08%). The reason for this is that the first two types of learners provide sufficient behavioral logs that can be used to mine their learning rules, thus greatly increasing the likelihood of recommendation success. On the contrary, the inactive learners are much less engaged and capable than the other two groups. It is therefore difficult to summarize their learning patterns and accurately recommend Civic Education videos for them.

Figure 5.

Recommendations for resources for three different learners groups

Accuracy of Recommendation Results of Hybrid Recommendation Algorithms

In this paper, active learners, potential learners and inactive learners are explored in terms of accuracy, recall and F1 value under hybrid recommendation algorithm at N=20. The accuracy of the hybrid algorithm recommendation results is shown in Figure 6. The results show that the improved similarity accuracy of active learners is always the highest, followed by potential learners and inactive learners with 97.43%, 83.77% and 63.16%, respectively. Since active learners have richer records of learning behaviors, the improved similarity recommendation algorithm proposed in this paper can describe their learning status more accurately. In addition, it can be seen from the figure that the recall rate of active learners after similarity improvement is lower in general. Specifically, active learners watched a lot of videos, while potential learners and inactive learners watched much fewer videos than active learners, with recall rates of 58.14%, 86.43%, and 97.45%, respectively. Especially, most of the inactive learners only studied 0 to 4 videos. Therefore, the value of the recall rate is larger. Although the number of successfully recommended videos is higher for active learners than for the other two groups, this value is still small compared to the number of educational video resources they have engaged in learning, which is the reason for the generally lower recall rate for active learners. Active learners have a greater F1 value than potential learners have a greater F1 value than inactive learners, with F1 values of 88.26%, 77.26%, and 43.71%, respectively. This is related to the number of videos watched by the learners, the more videos watched, the better the performance. From the above analysis, it can be seen that when the recommendation list is fixed, the accuracy rate can be more objective and reasonable to evaluate the accuracy of the recommendation algorithm, because this index refers to the proportion of successfully recommended educational videos in the recommendation list, which is not affected by the learner’s identity label.

Figure 6.

The algorithm recommended the accuracy of the results

Adaptive effects of hybrid recommendation algorithms

The ZPD theory states that in the next stage of learning, learners need to be recommended challenging content to motivate them and their potential to reach a higher cognitive level. Therefore, in order to verify the scoring effect of the collaborative filtering hybrid recommendation algorithm model proposed in this paper, the difficulty level of the recommended content is investigated. The students’ adaptive results of the recommendation algorithm are shown in Figure 7. Combined with the ZPD theory, we specifically analyze the impact of the hybrid recommendation algorithm proposed in this paper after improving the similarity by the penalty factor on the scoring prediction effect for three groups of learners:

For active learners, the average difficulty of the educational video resources recommended to them is 0.1163-0.1399 higher than the average difficulty of the videos they actually watched. This result is reasonable for highly capable active learners, and the proposed personalized exploration strategy provides more difficult learning content to stimulate their potential and explore their highest point of ZPD.

For potential learners, due to their higher ability, the difficulty of the recommended videos is also slightly higher than that of the videos they have actually learned, but the difference between the higher difference is about 0.0662-0.0935, which is lower compared to the difficulty difference of the recommended educational videos for active learners. The reason for this is that potential learners watch fewer educational videos, and slightly increasing the difficulty can effectively motivate them and encourage them to participate more in learning. Therefore, slowly increasing the difficulty is conducive to maximizing their potential.

For inactive learners, a very small number of recommended educational videos were slightly more difficult, and most were still within their ability. Overall, the difference in difficulty ranges from -0.0173 to 0.0191, which is consistent with their own weaker abilities. The difficulty of the learning resources should not be too different from the learners’ ability, otherwise it may cause cognitive load.

Figure 7.

The students’ adaptability to the recommended algorithm

Finally, in order to verify the adaptive effect of the recommendation algorithm in this paper, we select the active learners with the highest accuracy rate from the three groups of learner groups in order to further analyze the adaptive effect on the recommended content to prove that our recommendation scheme can recommend for learners with different abilities. The results of the comparison of the difficulty values between the actual viewing of the Civics education videos and the recommendations when the strength of the recommendation list N=20 in the cumulative hit rate are shown in Figure 8 below. It can be seen that the difficulty of the educational videos recommended by the algorithm proposed in this paper is mostly higher or equal to the difficulty of the actual learning videos. The absolute value of the difficulty difference between the two is between 0.001-0.016 (see the shaded part in the figure), which controls the difficulty of the learning content well within the ability of the learners and provides some challenging content. Thus, the improvement of the personalized exploration strategy makes our proposed recommendation scheme well-adapted.

Figure 8.

The difficulty of viewing and recommending the education video

Conclusion

This paper improves the collaborative filtering recommendation algorithm by introducing two penalty factors for attribute classification on the basis of similarity calculation of the traditional algorithm. Using hybrid recommendation algorithm resource similarity for rating prediction, in order to solve the traditional algorithm due to the “cold start” so that the rating data is too sparse caused by the decline in the quality of the recommendation.

Students are categorized into “A active learners, B potential learners and C inactive learners” according to the degree of their video clicking activity. The frequency of video hits of the three types of students is A (75.89 hits) > B (66.11 hits) > C (58.17 hits).

The cumulative hit rate of students increased with the intensity of recommendation list (N=5, N=10, N=20). Cumulative hit rates for active learners and potential learners were greater than 95% for N=5, N=10, and N=20, while cumulative hit rates for inactive learners were less than 50%. The active learners consistently had the highest accuracy rates, followed by the potential learners and inactive learners with 97.43%, 83.77% and 63.16%, respectively. The more active learners, potential learners and inactive learners watch the video, the larger the F1 value (88.26%, 77.26% and 43.71%) and the better the model performs.

The average difficulty of educational video resources for active learners, potential learners and inactive learners is higher than the average difficulty of the videos they actually watched by 0.1163-0.1399, 0.1163-0.1399 and -0.0173- respectively. 0.0191 between them, which is consistent with their own weaker ability. And the difficulty of the educational videos recommended by the algorithm proposed in this paper is mostly higher than or equal to the difficulty of the actual learning videos, and the collaborative filtering hybrid recommendation algorithm model has good adaptability.

Language:
English