Accès libre

Analysis of Factors Influencing Public Foundation Course Performance among University Students Based on Decision Tree Models

  
27 févr. 2025
À propos de cet article

Citez
Télécharger la couverture

Introduction

In the backdrop of China’s examination-oriented education, academic performance has become a significant issue affecting students' academic progress, family harmony, and personal development. Identifying the factors that influence academic performance and exploring methods to improve it is crucial for enhancing learning outcomes and teaching quality. With the expansion of China’s education scale, schools are increasingly focused on student data, not just for storage but also for analysis to understand each student’s dynamics and provide tailored teaching to boost their performance. The development of big data and the shift to online classes due to the pandemic necessitate more sophisticated data processing to monitor students' real-time conditions and ensure educational quality. Factors impacting academic performance are multifaceted. Just as an iceberg, where visible aspects represent students' academic performance and grades, underlying factors often go unnoticed. Analyzing these underlying factors is essential. Studies have used appropriate CBR- KBS models to predict and evaluate student datasets [1], explored the sources and reasons for factors influencing academic performance in Portuguese students [2], and examined the impact of various factors on learning quality using decision tree models [3]. A decision tree-based system for university student performance analysis and prediction has been developed to help students overcome weaknesses and improve grades [4]. Comparisons of accuracy among three different decision tree algorithms have led to the creation of performance predictors that increase teacher focus on students [5]. Four classification algorithms were applied to predict student performance, improving the accuracy of decision tree algorithms [6]. Decision tree algorithms combined with educational data mining methods have been utilized to predict student performance [7]. Classification techniques in data mining have been employed to uncover patterns between admission and graduation scores [8]. It discusses three student performance prediction models, focusing on how to improve scores using the J48 decision tree algorithm [9]. It introduces a concept of integrated multi-classifier prediction for student performance based on three complementary algorithms [10]. It uses the ID3 decision tree induction algorithm to build a performance prediction model, offering support to struggling students [11]. A decision tree model was established to automatically detect anomalies in student exam scores [12]. Decision tree algorithms have been applied to student databases to extract predictions of final grades and identify weaker students for additional attention [13]. Educational data mining and decision tree algorithms have been used to analyze the relationship between quantitative and qualitative data in student performance [14]. Decision tree methods have proven beneficial in predicting exam scores [15]. Research based on over 80,000 score records from a Beijing University investigates course relevance and score prediction [16]. To aid student learning and school management, a study on the factors influencing student performance was conducted using decision tree algorithms [17]. To address the inability to uncover useful information within scores, a statistics and analysis system based on decision tree technology was designed [18]. Summarizing extensive research, a new approach was proposed for developing an early warning system for exam scores using data mining techniques [19]. Data mining algorithms and deep learning techniques have been applied to analyze student behavior, leading to improved methods for predicting academic performance [20].

Factors Influencing Academic Performance in Public Foundation Courses
Cognitive Factors

Cognition, often associated with IQ tests, encompasses the ability to solve problems or create practical solutions in everyday life, known as intelligence. Current scientific research categorizes intelligence into two primary types: fluid and crystallized.

Fluid Intelligence: Refers to the speed of information processing, handling abstract concepts, logical relationships, and basic knowledge. It involves solving novel problems, using logical reasoning, and recognizing patterns, such as perception, memory, and calculation speed.

Crystallized Intelligence: Also known as acquired traits, this involves the application of learned knowledge and experience, including language skills, judgment, association, mechanical ability, and numerical computation. In academic settings, cognitive abilities frequently used include deductive reasoning, detail perception, numerical computation, semantic understanding, and spatial orientation.

Deductive Reasoning: Assesses the ability to infer and predict, drawing conclusions from given information, often evaluating short-term memory and problem-solving skills.

Detail Perception: Measures the ability to identify errors in written materials, numbers, and charts, distinguish similarities and differences, and correct mistakes, assessing the speed of understanding and comparing lexical information.

Numerical Computation: Evaluates numerical calculation and basic reasoning skills, measuring the ability to handle large amounts of data efficiently.

Semantic Understanding: Assesses vocabulary knowledge and comprehension, evaluating the ability to understand and recognize words with similar or opposite meanings, and learn or work in environments requiring clear instructions.

Spatial Orientation: Measures the ability to create and manipulate objects mentally, related to mechanical reasoning, and evaluates the ability to visualize and compare shapes in tasks requiring visual skills.

Each individual possesses multiple intelligences, and the combination of different intelligences in the same learning scenario can vary in effectiveness, influencing academic performance.

Environmental Factors

Learning Motivation: Varies among individuals, stemming from the recognition of knowledge's value, self-awareness, and emotional interests. For example, a student's thoughts, emotions, and perceptions of teachers directly impact learning outcomes.

Learning Willpower: Refers to the perseverance needed to overcome difficulties in completing learning tasks, testing a student's stress tolerance. Longer study durations often indicate stronger willpower, while poor self-management and distractibility lead to inconsistent performance.

Learning Attitude: Comprises three aspects:

Affective: Reflects the student's interest and initiative in learning.

Cognitive: Represents the student's beliefs and awareness about learning.

Behavioral: Reveals the student's actions and observable behaviors towards learning, influencing their engagement and choices.

Learning Habits: Developed through repeated actions over time, good habits enhance understanding, self-discipline, willpower, and attitude adjustment. They are crucial in subjects requiring long-term accumulation.

Self-Concept: Involves three dimensions:

Self-Esteem: The degree to which an individual values respect from themselves and others. Low self-esteem can lead to apathy and self-neglect.

Self-Confidence: The belief in one's abilities, often a foundation for success and motivation.

Independence: The need for personal space, independent thinking, and character development, requiring balanced guidance from parents and teachers to avoid extremes.

Psychological Factors

Family Environment: Parents and relatives' thoughts and behaviors significantly impact a child's psychological development. A harmonious family fosters positive social interactions, while a dysfunctional family can lead to extreme thoughts or heightened resistance.

School Environment:

Academic Pressure: Can cause anxiety, as students bear the burden of expectations.

Peer Competition: Healthy competition is beneficial, but unhealthy competition can harm both academic performance and peer relationships.

Teacher-Student Relationship: Teachers should reflect on their practices to maintain positive relationships and strengthen connections with students.

Social Environment: Advanced communication technologies facilitate easier interaction but can also lead to issues like social coldness and reduced trust. Negative societal views, such as the belief that education is unnecessary for success, can mislead students. It is crucial to correct such misconceptions and provide psychological support.

These factors collectively influence students' academic performance in public foundation courses, highlighting the need for a comprehensive approach to education.

Decision Tree Model
Decision tree definition

As the name suggests, the basis of a decision-making tree is a tree, and a tree is composed of tree roots, tree trunks, leaves and other parts, which is indispensable.So these elements should also be included in the decision tree, such as the root, internal, and leaf nodes. The overall shape of the decision tree model is the same as a tree, just an inverted tree, with roots on top and leaves below. It starts from a root node, spreads out of multiple internal nodes, and finally forms several leaf nodes. For the overall decision tree model, the rectangular boxes represent the decision nodes, the circles represent the state nodes, and the triangles represent the result node, or the leaf node. Decision tree flowchart is shown in Figure 1.

Figure 1.

Decision tree flowchart

feature selection
Information entropy

We can understand the information entropy as the occurrence probability of a particular information. So the sign of the information entropy should be opposite to the thermodynamic entropy.

H(x)=E[ I(xi) ]=E[ log(2,1/P(xi)) ]= P(xi)log(2,P(xi))(i=1,2,,n) Ent(D)=k=1KPklog2Pk

Assuming to divide the sample set D by the attribute a containing V discrete values, the V branch nodes are generated, where v represents the v-th branch node. The original dataset D can be calculated.

Gain(D,a)=Ent(D)v=1V| DV ||D|Ent(DV)

The ID3 algorithm selects the attribute as the maximum information gain each time during the recursion process.

Gain rate

Information gain rate calculation formula: Gainratio=Gain(A)/I

Gain(S,A) approach Gain(S,A)=E(S)E(S,A)

Gain rate is defined as: Gain_ratio(D,a)=Gain(D,a)IV(a) Where: IV(a)=v=1V| Dv ||D|log2| Dv ||D|

Gini index

The CART decision tree divides the Gini index into attributes, and extracts different probabilities from different data sets. The smaller the value of Gini (D), the better the performance.

Gini(D)=k=1|y|k´kPkPk´=1k=1|y|Pk2

Furthermore, the Gini index is: Gini_index(D,a)=v=1V| Dv ||D|Gini(Dv)

The purity of the dataset D is available to be measured by the Gini values Gini(D)=i=1np(xi)*(1p(xi))=1i=1np(xi)2

In which, p(xi) is the classification xi, The probability of occurrence, n, is the number of classifications. Gini (D) reflects the probability of random two samples from dataset D. Therefore, the smaller the Gini (D), the higher the purity of the dataset D.

D1=(x,y)DA(x)=a D2=DD1

Under the condition of attribute A, the Gini coefficient of sample D is defined as GiniIndex(DA=a)=| D1 ||D|Gini(D1)+| D2 ||D|Gini(D2)

Pruning

Regardless of the training set, the decision tree always separates the classes very well from the other, starting with the aforementioned problem of being too dependent on the training sample. There are two pruning strategies as follows:

Pre-pruning: evaluate in the process of construction, then consider whether to branch.

Post-pruning: After constructing a complete decision tree, evaluate the necessity of branches from bottom-up.

There are many post-pruning methods that can be used in the classification regression trees. Its surface error rate gain value is calculated α=C(t)C(Tt)| Tt |1

|Tt|: The number of leaf nodes included in the subtree

The C (t): The node is pruned with t as the error cost of the single-node tree C(t)=r(t)*p(t)

The r (t): The error rate of the node t

The p (t): the proportion of the data on the node t to the number of all households

C(Tt) is a subtree T with t as the root nodet. The cost of error of, if the node is not pruned, it is equal to the subtree Tt. The sum of the error costs for all the leaf nodes on the top.

Continuous values versus missing values were processed
Continuous value processing

For the attribute of continuous value, if each value is not feasible as a branch, so discretization is needed. The common method is dichotomy. The basic idea is: given the sample set D and continuous attributes, the dichotomy tries to find a partition point t to divide the sample set D > t in the attributes.

First, all the values are arranged in ascending order, and the mean of all adjacent attributes is taken as the candidate partition points.

Calculate the information gain after dividing the set D of each partition point.

Select the partition point with the maximum information gain as the optimal partition point.

Gain(D,a)=maxtTaGain(D,a,t) Gain(D,a)=maxtTaEnt(D)λ{,+}| Dtλ ||D|Ent(Dtλ)
Missing value processing

In reality, we often encounter incomplete samples, that is, some attribute values are missing. Sometimes, simple elimination causes a lot of information waste, so two problems need to be solved in the case of missing attribute values:

How to select the partition attribute when the attribute value is missing.

Given a partition attribute, if a sample has a missing value on the attribute, how to divide it to a specific branch.

Assuming that each sample in the sample set is given a weight, the weight in the root node is initialized to 1. Then Definition: ρ=xD˜wxxDwx p˜k=xD˜kwxxD˜wx(1k|y|) r˜v=xD˜vwxxD˜wx(1vV)

By selecting the subset of samples in the sample set D, the information gain on the sample subset is calculated. The final information gain is equal to the information gain of the sample subset multiplied by the proportion of the sample subset to the sample set.

Gain(D,a)=ρ×Gain(D~,a)=ρ×(Ent(D~)v=1Vr~vEnt(D~v))

If the subset of the sample has missing values on the attribute, row the sample into all branch nodes with different weights. The weight of the sample in the branch node becomes: ωx=ωx*r˜v

Analysis of the factors influencing the student performance
Establishment of the cube data set

From primary school to the university, and even to the society, we will experience many examinations, but the rules and scoring methods of each examination are different, so in order to facilitate the research and analysis, the high school examination methods are uniformly selected here.

High school career is the most important experience in our entire academic career, because the importance of the college entrance examination is self-evident. For some students, it is a direct thing that determines their whole life. At present, the assessment methods and scoring methods of different provinces of the national college entrance examination are also different, here to take the Chongqing college entrance examination as an example to explore. In recent years, the reform of the college entrance examination in our city has been changed to the current new college entrance examination, no longer subdivided into arts and science, but freely select three of the six elective courses, the difficulty coefficient is lower, and the degree of freedom is higher.

But in addition to the normal candidates, there are other art students or sports students, so the importance of the number of the main three subjects is more prominent. Of course, for some universities, sports vegetarian or artistic talent must also be met. However, the total number of physical art candidates only accounts for a small part, so the physical art candidates can be ignored, focus on the analysis of the normal candidates.

In order to analyze the achievements, we need to build a cube data database, select and export data to facilitate prediction and analysis. Data database is shown in tables 1-3.

Student Information Building Table

Code Name Data type Field size
SPN Surname and personal name text 8
Sex Sex varchar 2
Ag Age int 2
NA Nation text 10
ID_N ID number varchar 18
CN Contact number varchar 11
ID_S Student ID varchar 10
HA Home address text 20

Student Achievement Information Form

Code Name Data type Field size
SPN surname and personal name text 8
Sex Sex varchar 2
S_ID Student ID varchar 10
NA Nation text 2
TP Total points int 3
Cn Chinese int 3
MA Mathematics int 3
Es English int 3
PH Physics / History int 3
C Chemistry int 3
Bg Geographical features of a place int 3
IP Ideology and politics int 3
LT Living things int 3

Teacher Information Collection Form

Code name data type Field size
SPN Surname and personal name text 8
TS Teaching subjects varchar 5
WT Whether the head teacher varchar 2
TC Teaching class varchar 5
Sex Sex varchar 2
Age Age int 2
NA Nation text 10
ID_N ID number varchar 20
CN Contact number varchar 20
AD Address varchar 10

When studying the scores of high school students, the results of each exam should be recorded to avoid errors, so the database of students 'scores should be built, so as to better analyze and predict the factors of students' performance.

Teachers 'information collection table is the statistics of the teaching situation of teachers. When analyzing the influencing factors of students' performance, this paper can analyze and predict the relationship between students and teachers, so as to achieve the simple operation of data mining in the later stage.

Experimental analysis of the influencing factors
Experimental analysis of the innate factors

What we hear more about is that people's IQ is not very different, and in reality, no one will admit that their IQ is worse than others. However, the academic performance is very different, in fact, people and people's IQ is very different, and IQ is generally affected by genetic factors, which cannot be changed. However, IQ has an impact on the performance, but also need to experiment to compare, confirm the guess. Statistics of grade levels for different IQs is shown in table 4.

Statistics of grade levels for different IQs

IQ level Under 400 400-500 500-600 600-700 700-750
80-90 25 20 5 0 0
90-100 50 150 90 10 0
100-110 80 195 172 50 3
110-120 15 50 80 30 5
120-130 0 3 5 10 2

In Figure 2, 1000 students were randomly selected to first take an IQ test on them to get a general IQ level, and then counted their test scores to build the chart.

Figure 2.

Statistical chart of performance levels for different IQs

Then we chose a more important congenital factor to do the statistics. The experiment compares whether the understanding ability is helpful in Chinese learning. In this experiment, 300 students were randomly selected, roughly stratified their understanding, and then made statistics. From the chart, we can see that understanding ability is very helpful to Chinese learning. It is shown in table 5 and Figure 3.

Statistical table of the influence of comprehension ability on language performance

Understand the level of ability Below 90 90-100 100-110 110-120 More than 120
1-4 20 30 9 1 0
5-8 48 100 35 15 2
9-10 0 0 7 20 13
Figure 3.

Statistical chart of the influence of comprehension ability on Chinese performance

Experimental analysis of the educational environment

In the big category of educational environment, the family environment and the teacher-student relationship are two major projects.

The first is a comparative experimental analysis of the home environment. In a family environment, a rich family and a poor family environment are unmatched, which may also lead to students' grades being affected, but it is impossible to know whether the impact is good or bad.

According to table 6 and Figure 4, people from poor families generally get slightly worse grades than rich families, but this is also a driving factor to help students improve their grades.

Statistical tables of the impact of the family environment on performance

Under 400 400-500 500-600 600-700 700-750
Come from a poor family 30 100 48 20 2
Rich family 10 32 100 50 8
Figure 4.

Statistical diagram of the influence of the family environment on performance

Then there is the experimental analysis of the teacher-student relationship. The bad relationship between teachers and students is likely to cause the students to be unwilling to learn or do some bad things, so the teacher's education is also very important, not only in the lectures, but also in other aspects of things, which always affect the students' grades. It is shown in table 7 and Figure 5.

Statistical table of the influence of teacher-student relationship on class performance

The worst result best total of points average scores
Good class 380 650 560
Bad class 250 580 500
Figure 5.

Statistical diagram of the influence of teacher-student relationship on class performance

Experimental analysis of the self-contained factors

Generally, after the end of the exam, we will reflect on ourselves and reflect on their own shortcomings, and this is exactly what I want to say. The interference of the self-factor, perhaps many students think that there is nothing wrong with themselves, but the results always cannot go up, maybe this is caused by the self-factor. And what I said about the self-factor, in fact, is also equivalent to the acquired factor, which belongs to their own to control the factors others can't help you.

The first one is their own efforts, which is a very important factor. As the saying goes, a person has talent is not terrible, terrible is a person has talent and hard work. So hard work is as important as talent, and lack is nothing. It is shown in table 8 and Figure 6.

Statistical table of the importance of effort

Under 400 400-500 500-600 600-700 700-750
Efforts are important 10 15 18 20 20
Hard work doesn't matter 10 5 2 0 0
Figure 6.

Statistical plot of effort importance

This is the students' statistics of the importance of effort, and I can see that most people think that effort is really important, which can also be reflected in the data. Then comes the motivation to learn, which is very rare and difficult to understand. If hard work is the time of study, then the motivation is the efficiency of learning. If we have better efficiency in learning, then our efforts can double, and our results will rise. There are many aspects of motivation, maybe a very small thing or behavior will become motivation. It is shown in table 9 and Figure 6.

Statistical table of the influence of motivation on learning

Under 400 400-500 500-600 600-700 700-750
Have the power 5 10 15 18 20
unpowered 15 10 5 2 0
Figure 6.

Statistical diagram of the influence of no motivation on learning

Conclusion

Academic performance has always been an essential part of the study career, but also the embodiment of the learning effect, and it is also the most important thing under our national exam-oriented education system. In fact, if you really want to discuss it well, the whole world is divided according to the academic performance, but the difference is just the importance of the performance. The article finished the analysis of the factors affecting the students' performance, intelligence we cannot change, but the environmental factors and psychological factors need us to adjust, control the state of mind, adjust the state, to meet every day of study. I hope we can learn attentively, but also hope that our parents and teachers can pay more attention to each child's education, this is not what students can bear alone, need the efforts of everyone involved, to make students get better.