Acceso abierto

A Course Recommendation Method Based on the Integration of Curriculum Knowledge Graph and Collaborative Filtering

 y   
16 jun 2025

Cite
Descargar portada

Introduction

With the rapid development of information technology, there has been an explosion of data [1], and data mining technology has been widely applied in various fields such as education, communication and e-commerce [2]. In the field of education, how to recommend courses based on students' learning characteristics is a key focus of data mining [3]. In recommendation systems, domain-based recommendation is the most fundamental algorithm, which is generally divided into user-based collaborative filtering algorithms [4] and item-based collaborative filtering algorithms [5]. The user-based collaborative filtering algorithm recommends items to target users based on user similarity. When the target user has too few historical interactions with items, it cannot make accurate recommendations. This algorithm is more suitable for social recommendations such as news [6]. This algorithm is suitable for personalized user recommendations but also has the problem that too few interactions between users can lead to unsatisfactory recommendation results. In order to optimize the recommendation effect, scholars have considered introducing and expanding the sources of information. They can use auxiliary information such as the attributes of items themselves, users' social networks, and context to improve the accuracy of recommendations.

This paper proposes a RippleNet-CF model that combines the RippleNet model based on knowledge graphs and the collaborative filtering algorithm. The algorithm leverages course entities and the attributes of courses themselves to simulate the propagation of user course interests on the knowledge graph through ripple patterns. It also takes into account the interaction history between users and courses, such as viewing records and ratings, to uncover personalized recommendations for users. By expanding the sources of information and integrating the historical and current interests of target users, the accuracy of recommendation results is enhanced. The performance of the recommendation results is evaluated using three metrics: accuracy, recall, and F1.

Related Theories
A. Collaborative Filtering Recommendation Algorithm

The item-based collaborative filtering algorithm calculates the similarity between courses based on user preference data and then recommends a list of other courses that are similar to the ones the user likes [7]. However, it faces issues such as data sparsity and cold start. This paper chooses the course-based collaborative filtering algorithm for personailzed course recommendations, and the implementation of this algorithm is divided into two steps:

1) Calculate the similarity between courses

Construct a student-course matrix:Let U={u1,u2,u3,..,um} be the set of m students; I={i1,i2,i3,…,in} be the set of n courses, and Rm × n represent the rating matrix of students to courses as shown in formula (1): Rm×n=[ R11R12R13R1n1R1nR21R22R23R2n1R2nR31R32R33R3n1R3nRm11Rm12Rm13Rm1n1Rm1nRm1Rm2Rm3Rmn1Rmn ]

Here, Rij represents the rating of student Ui to course Ij, and the higher the value of Rij, the more student ui likes course Ij.

As an example, to measure how similar two courses are, all students' ratings for a given course are treated as an m × 1 vector. The ratings for course i are represented as Fi = {r1i, r2i, r3i, …, rmi}, and the ratings for course j are recorded as Fj = {r1j, r2j, r3j, …, rmj}. The formula for computing the similarity between courses i and j is provided in Equation (2). Wij=Fi·Fj Fi · Fj =u=1mrui·ruju=1mrui2·u=1mruj2

Among them, Wij represents the cosine similarity value between course i and course j, with a corresponding range of [-1, 1]. The Wij higher the value, the more similar courses i and j are, and the target user is expected to have similar behavior towards the course in the future.

Selecting Neighbors

When selecting neighbors, this paper chooses to rank them according to the similarity of courses. Then, several courses with the highest ranks from the sorted results are selected as neighbors.

B. Knowledge Graph Learning

Knowledge Graphs (KG) [8] can effectively map out vast amounts of disordered data through theoretical methods such as data mining and information processing, making it more convenient and accurate for people to obtain the information they need. A knowledge graph is a large-scale semantic network representing a complex web of relationships between entities, generally composed of (entity, relationship, entity) triples [9]. Incorporating knowledge graphs into recommendation systems can uncover deeper semantic relationships and more precisely identify the interests of target users. Currently, the application of knowledge graph feature learning [10] in recommendation systems is generally divided into: path-based recommendation algorithms [11]and embedding-based recommendation algorithms [12], with representative models including TransE, TransH, SME, NTN, etc.

C. RippleNet Model

Due to the limitations of knowledge graph perception reconstruction methods applied to recommendation systems, scholars have proposed another model, RippleNet [13].

The knowledge graph is constructed from the triple relationships corresponding to course entities G = {(h, r, t) | h, tR}. The goal of the RippleNet model is to construct a knowledge graph to utilize students' preferences for courses and calculate the click probability of student u for the target course v. The main implementation of its algorithm is as follows:

Deffnition 1: Item Embedding. Based on the characteristics, semantics, and attributes of items, the embedding is performed. Given the embedding vector v of a speciffed course and the 1-hop ripple set, each expansion outward yields a triplet. The relevance score between item v and each (hi, ri, ti) in the 1-hop set is calculated, and the linear relevance scores are normalized using the softmax function. Consequently, the head entity hi and the relation Ri of the triplet are treated as an association probability Pi, as shown in formula (3): pi=softmax(vTRihi)=exp(vTRihi)(h,r,t)Su1exp(vTRh) ou1=(hi,ri,ti)Su1piti

Here, vT represents the item vector, ti is the tail entity vector, hi is the head entity vector, and ri is the relation mapping matrix. Su1 the first-layer Ripple Set of a student (the first-hop Ripple Set, as shown in the figure 1) is formed by selecting a certain number of items from the student ’ s interaction history.Essentially, this process calculates the correlation and similarity between the seed node and its connected one-hop nodes in the knowledge graph, as represented by triples— illustrated in Figure 1.

Figure 1.

RippleNet Model Diagram.

By repeating the above process, the knowledge graph undergoes multi-hop propagation. The corresponding vectors obtained from each hop are then summed to generate the student's embedding vector (user embedding). After repeating the process H times, H output vectors o are obtained, and the final user embedding is calculated according to Equation (5). u=ou1+ou2++ouH

Finally, the likelihood of user u engaging with course v is computed by integrating their respective latent representations, as illustrated in Equation (6). yuv=σ(uTv)σ(x)=11+ex

Integration of the RippleNet Model and an Item-Oriented Collaborative Filtering Approach

Conventional item-level recommendation techniques algorithms only consider users’ rating data on courses. After extracting the relationships between courses and their attributes, this paper proposes an algorithm that integrates course attribute information with user-course interaction data by combining the RippleNet model and collaborative filtering. The RippleNet model leverages historical user-course rating records as implicit relationships between users and items. It constructs a knowledge graph based on the relationships among course attributes and extracts corresponding triples for each course. Using the ripple propagation mechanism through these triples, it computes user preferences. Meanwhile, the item-oriented filtering method estimates a user's interest in unvisited courses by analyzing past interactions between the user and various courses. By combining both approaches, a comprehensive course recommendation list is generated. This method fully utilizes the strengths of both algorithms by linearly fusing the results of the two recommendation lists. The fusion method is defined in Equation (7). C=β*Y+(1β)*P

Here, β represents the weight within the range (0, 1). Y indicates the likelihood that the target user clicks on unseen courses as inferred by the RippleNet model, while P reflects the same likelihood as estimated through the collaborative filtering method.

By integrating the knowledge graph and collaborative filtering course recommendation algorithms from both direct and indirect perspectives, the limitations of using a single approach can be effectively mitigated. The knowledge graph also provides strong interpretability throughout the entire process. The corresponding flowchart of the integrated RippleNet and collaborative filtering recommendation algorithm (RippleNet-CF) is shown in Figure 2.

Figure 2.

Flowchart of the Integrated Recommendation Algorithm

Experimental Results and Analysis
A. Dataset and Preprocessing

The dataset used in this experiment is MOOCCube, which was collected by a research team from Tsinghua University from the XuetangX platform. They extracted entities such as courses, concepts, and students, and built a knowledge base based on the complex relationships among these entities. This educational resource database is large in scale and rich in data, especially with detailed records of student behavior, including learning duration, frequency, and video segments viewed. The dataset used in this experiment involves nearly 200,000 students and approximately 5 million video viewing records [14]. Before conducting the experiment, the collected online student dataset needs to be preprocessed. The specific steps are as follows:

Integrate the video viewing information of each student from the MOOCCube dataset, calculating the total duration of videos for the same course as well as the specific viewing details of the students.

Handling of missing or duplicate values. For data that is missing or duplicated, it is directly removed.

The learner’s rating is determined by the ratio between their actual viewing time (t) and the total video length (T). That is, the rating score = t/T. Furthermore, these scores are categorized into five distinct levels, with the detailed classification criteria provided in Table 1.

COURSE RATING

Rating Score
S<0.2 1
0.2≤S<0.4 2
0.4≤S<0.6 3
0.6≤S<0.8 4
S≥0.8 5
B. Constructing a Knowledge Graph

Based on the results of data preprocessing, mark the user-course interaction Yuv = 1 if the user's rating for the course is greater than or equal to 4, and mark Yuv = 0 for other scores. According to the courses that users have interacted with, extract the relationships between the attributes of the courses themselves to construct triples. Since there are too many entities in each course for constructing triples, to lower the cost of constructing the knowledge representation, each course is associated with only five extracted entities, as illustrated in Table 2.

EXTRACTION OF SOME COURSE ENTITIES

Course Name Entity
Popular Java Framework Tsinghua University Press, October 2018, Lectured by Li Lian, Knowledge Points, Computer
Data Structures People's Publishing House, February 2022, Yu Yun, Knowledge Points, Computer
Database Principles Posts and Telecommunications Press, October 2018, Cao Lan, Knowledge Points, Computer
Advanced Mathematics Tsinghua University Press, September 2018, Zhang Yu, Knowledge Points, Mathematics

After determining the corresponding entities, construct the corresponding ternary relationships, a total of 5 types of entities are constructed as shown in Table 3.

TERNARY ENTITY RELATIONSHIPS

Entity Relationship Entity
Course Name Taught by Teacher
Course Name Published by Specific Publisher
Course Name Time Specific Publication Time
Course Name Belongs to Specific Category
Course Name Contains Knowledge Points

Based on the construction of ternary relationships for association: for instance, if a teacher teaches several courses, one of the courses can be associated with another by the common teacher who teaches them, as specifically shown in Figure 3: In this experiment, a total of 447,517 ternary relationships were constructed.

Figure 3.

Partial View of the Knowledge Graph

C. Evaluation Metrics

The experimental results in this paper adopt a Top-N recommendation strategy for delivering personalized suggestions to target users. Performance is assessed through three evaluation indicators: precision, recall, and F1 score. In this context, L(u) denotes the actual recommendation list for user U in the test dataset, while R(u) corresponds to the predicted list generated by the algorithm. Here, U refers to the set of users, and I signifies the collection of available courses.

Precision: The calculation method is as shown in Formula (8). Precision=uϵU|L(u)R(u)|uϵU|L(u)|

Recall: The calculation method is shown in Equation (9). Recall=uϵU|L(u)R(u)|uϵU|R(u)|

F1 Score (F-Measure): The calculation method is shown in Equation (10). F1=2Precision*Recall Precision+Recall 

D. Experimental Results Analysis

In this paper's RippleNet-CF algorithm, the weight β in equation (9) needs to be trained with corresponding parameters, and the results are shown in Figure 4:

Figure 4.

RippleNet-CF Results Chart

From Figure 4, it can be concluded that both accuracy and recall increase as the weight value increases within the range of [0.1, 0.6], corresponding to higher probability values. The accuracy and recall reach their maximum when the weight β equals 0.6. However, the coverage rate is highest at 0.4 and then decreases as the weight value increases.

From Figures 5, 6, and 7, it can be seen that at β= 0.6 and h = 2 as the recommendation list increases, the RippleNet-CF method has the best accuracy, recall, and F1 scores compared to the other four algorithms. This is because RippleNet-CF not only uses the interaction information between users and items but also mines the potential connections between courses to expand the information source, thereby improving the optimization effect.

Figure 5.

Accuracy Results Chart

Figure 6.

Recall Results Chart

Figure 7.

F1 Score Chart

Summary and Future Work

In response to the traditional item-based collaborative filtering algorithm, which does not fully utilize the attribute information of items themselves, this paper proposes the RippleNet-CF method using course attribute knowledge graphs and interaction information. This method uses knowledge graphs to explore the potential connections between courses and collaborative filtering to explore existing user connections, thereby improving the issues of data sparsity and cold start problems. However, courses are offered according to semesters and have strong practical sequential characteristics. Future work will consider incorporating time series feature information to further improve accuracy.

Idioma:
Inglés
Calendario de la edición:
4 veces al año
Temas de la revista:
Informática, Informática, otros