Analysis of Factors Influencing Public Foundation Course Performance among University Students Based on Decision Tree Models
Publié en ligne: 27 févr. 2025
Reçu: 24 sept. 2024
Accepté: 15 janv. 2025
DOI: https://doi.org/10.2478/amns-2025-0132
Mots clés
© 2025 Hongshun Zhang, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
In the backdrop of China’s examination-oriented education, academic performance has become a significant issue affecting students' academic progress, family harmony, and personal development. Identifying the factors that influence academic performance and exploring methods to improve it is crucial for enhancing learning outcomes and teaching quality. With the expansion of China’s education scale, schools are increasingly focused on student data, not just for storage but also for analysis to understand each student’s dynamics and provide tailored teaching to boost their performance. The development of big data and the shift to online classes due to the pandemic necessitate more sophisticated data processing to monitor students' real-time conditions and ensure educational quality. Factors impacting academic performance are multifaceted. Just as an iceberg, where visible aspects represent students' academic performance and grades, underlying factors often go unnoticed. Analyzing these underlying factors is essential. Studies have used appropriate CBR- KBS models to predict and evaluate student datasets [1], explored the sources and reasons for factors influencing academic performance in Portuguese students [2], and examined the impact of various factors on learning quality using decision tree models [3]. A decision tree-based system for university student performance analysis and prediction has been developed to help students overcome weaknesses and improve grades [4]. Comparisons of accuracy among three different decision tree algorithms have led to the creation of performance predictors that increase teacher focus on students [5]. Four classification algorithms were applied to predict student performance, improving the accuracy of decision tree algorithms [6]. Decision tree algorithms combined with educational data mining methods have been utilized to predict student performance [7]. Classification techniques in data mining have been employed to uncover patterns between admission and graduation scores [8]. It discusses three student performance prediction models, focusing on how to improve scores using the J48 decision tree algorithm [9]. It introduces a concept of integrated multi-classifier prediction for student performance based on three complementary algorithms [10]. It uses the ID3 decision tree induction algorithm to build a performance prediction model, offering support to struggling students [11]. A decision tree model was established to automatically detect anomalies in student exam scores [12]. Decision tree algorithms have been applied to student databases to extract predictions of final grades and identify weaker students for additional attention [13]. Educational data mining and decision tree algorithms have been used to analyze the relationship between quantitative and qualitative data in student performance [14]. Decision tree methods have proven beneficial in predicting exam scores [15]. Research based on over 80,000 score records from a Beijing University investigates course relevance and score prediction [16]. To aid student learning and school management, a study on the factors influencing student performance was conducted using decision tree algorithms [17]. To address the inability to uncover useful information within scores, a statistics and analysis system based on decision tree technology was designed [18]. Summarizing extensive research, a new approach was proposed for developing an early warning system for exam scores using data mining techniques [19]. Data mining algorithms and deep learning techniques have been applied to analyze student behavior, leading to improved methods for predicting academic performance [20].
Cognition, often associated with IQ tests, encompasses the ability to solve problems or create practical solutions in everyday life, known as intelligence. Current scientific research categorizes intelligence into two primary types: fluid and crystallized.
Each individual possesses multiple intelligences, and the combination of different intelligences in the same learning scenario can vary in effectiveness, influencing academic performance.
Family Environment: Parents and relatives' thoughts and behaviors significantly impact a child's psychological development. A harmonious family fosters positive social interactions, while a dysfunctional family can lead to extreme thoughts or heightened resistance.
School Environment:
Academic Pressure: Can cause anxiety, as students bear the burden of expectations.
Peer Competition: Healthy competition is beneficial, but unhealthy competition can harm both academic performance and peer relationships.
Teacher-Student Relationship: Teachers should reflect on their practices to maintain positive relationships and strengthen connections with students.
Social Environment: Advanced communication technologies facilitate easier interaction but can also lead to issues like social coldness and reduced trust. Negative societal views, such as the belief that education is unnecessary for success, can mislead students. It is crucial to correct such misconceptions and provide psychological support.
These factors collectively influence students' academic performance in public foundation courses, highlighting the need for a comprehensive approach to education.
As the name suggests, the basis of a decision-making tree is a tree, and a tree is composed of tree roots, tree trunks, leaves and other parts, which is indispensable.So these elements should also be included in the decision tree, such as the root, internal, and leaf nodes. The overall shape of the decision tree model is the same as a tree, just an inverted tree, with roots on top and leaves below. It starts from a root node, spreads out of multiple internal nodes, and finally forms several leaf nodes. For the overall decision tree model, the rectangular boxes represent the decision nodes, the circles represent the state nodes, and the triangles represent the result node, or the leaf node. Decision tree flowchart is shown in Figure 1.

Decision tree flowchart
We can understand the information entropy as the occurrence probability of a particular information. So the sign of the information entropy should be opposite to the thermodynamic entropy.
Assuming to divide the sample set D by the attribute a containing V discrete values, the V branch nodes are generated, where v represents the v-th branch node. The original dataset
The ID3 algorithm selects the attribute as the maximum information gain each time during the recursion process.
Information gain rate calculation formula:
Gain(S,A) approach
Gain rate is defined as:
The CART decision tree divides the Gini index into attributes, and extracts different probabilities from different data sets. The smaller the value of Gini (D), the better the performance.
Furthermore, the Gini index is:
The purity of the dataset D is available to be measured by the Gini values
In which, p(xi) is the classification xi, The probability of occurrence, n, is the number of classifications. Gini (D) reflects the probability of random two samples from dataset
Under the condition of attribute A, the Gini coefficient of sample D is defined as
Regardless of the training set, the decision tree always separates the classes very well from the other, starting with the aforementioned problem of being too dependent on the training sample. There are two pruning strategies as follows:
Pre-pruning: evaluate in the process of construction, then consider whether to branch. Post-pruning: After constructing a complete decision tree, evaluate the necessity of branches from bottom-up.
There are many post-pruning methods that can be used in the classification regression trees. Its surface error rate gain value is calculated
|Tt|: The number of leaf nodes included in the subtree
The C (t): The node is pruned with t as the error cost of the single-node tree
The r (t): The error rate of the node t
The p (t): the proportion of the data on the node t to the number of all households
C(Tt) is a subtree T with t as the root nodet. The cost of error of, if the node is not pruned, it is equal to the subtree Tt. The sum of the error costs for all the leaf nodes on the top.
For the attribute of continuous value, if each value is not feasible as a branch, so discretization is needed. The common method is dichotomy. The basic idea is: given the sample set D and continuous attributes, the dichotomy tries to find a partition point t to divide the sample set D > t in the attributes.
First, all the values are arranged in ascending order, and the mean of all adjacent attributes is taken as the candidate partition points.
Calculate the information gain after dividing the set D of each partition point.
Select the partition point with the maximum information gain as the optimal partition point.
In reality, we often encounter incomplete samples, that is, some attribute values are missing. Sometimes, simple elimination causes a lot of information waste, so two problems need to be solved in the case of missing attribute values:
How to select the partition attribute when the attribute value is missing. Given a partition attribute, if a sample has a missing value on the attribute, how to divide it to a specific branch.
Assuming that each sample in the sample set is given a weight, the weight in the root node is initialized to 1. Then Definition:
By selecting the subset of samples in the sample set D, the information gain on the sample subset is calculated. The final information gain is equal to the information gain of the sample subset multiplied by the proportion of the sample subset to the sample set.
If the subset of the sample has missing values on the attribute, row the sample into all branch nodes with different weights. The weight of the sample in the branch node becomes:
From primary school to the university, and even to the society, we will experience many examinations, but the rules and scoring methods of each examination are different, so in order to facilitate the research and analysis, the high school examination methods are uniformly selected here.
High school career is the most important experience in our entire academic career, because the importance of the college entrance examination is self-evident. For some students, it is a direct thing that determines their whole life. At present, the assessment methods and scoring methods of different provinces of the national college entrance examination are also different, here to take the Chongqing college entrance examination as an example to explore. In recent years, the reform of the college entrance examination in our city has been changed to the current new college entrance examination, no longer subdivided into arts and science, but freely select three of the six elective courses, the difficulty coefficient is lower, and the degree of freedom is higher.
But in addition to the normal candidates, there are other art students or sports students, so the importance of the number of the main three subjects is more prominent. Of course, for some universities, sports vegetarian or artistic talent must also be met. However, the total number of physical art candidates only accounts for a small part, so the physical art candidates can be ignored, focus on the analysis of the normal candidates.
In order to analyze the achievements, we need to build a cube data database, select and export data to facilitate prediction and analysis. Data database is shown in tables 1-3.
Student Information Building Table
Code | Name | Data type | Field size |
---|---|---|---|
SPN | Surname and personal name | text | 8 |
Sex | Sex | varchar | 2 |
Ag | Age | int | 2 |
NA | Nation | text | 10 |
ID_N | ID number | varchar | 18 |
CN | Contact number | varchar | 11 |
ID_S | Student ID | varchar | 10 |
HA | Home address | text | 20 |
Student Achievement Information Form
Code | Name | Data type | Field size |
---|---|---|---|
SPN | surname and personal name | text | 8 |
Sex | Sex | varchar | 2 |
S_ID | Student ID | varchar | 10 |
NA | Nation | text | 2 |
TP | Total points | int | 3 |
Cn | Chinese | int | 3 |
MA | Mathematics | int | 3 |
Es | English | int | 3 |
PH | Physics / History | int | 3 |
C | Chemistry | int | 3 |
Bg | Geographical features of a place | int | 3 |
IP | Ideology and politics | int | 3 |
LT | Living things | int | 3 |
Teacher Information Collection Form
Code | name | data type | Field size |
---|---|---|---|
SPN | Surname and personal name | text | 8 |
TS | Teaching subjects | varchar | 5 |
WT | Whether the head teacher | varchar | 2 |
TC | Teaching class | varchar | 5 |
Sex | Sex | varchar | 2 |
Age | Age | int | 2 |
NA | Nation | text | 10 |
ID_N | ID number | varchar | 20 |
CN | Contact number | varchar | 20 |
AD | Address | varchar | 10 |
When studying the scores of high school students, the results of each exam should be recorded to avoid errors, so the database of students 'scores should be built, so as to better analyze and predict the factors of students' performance.
Teachers 'information collection table is the statistics of the teaching situation of teachers. When analyzing the influencing factors of students' performance, this paper can analyze and predict the relationship between students and teachers, so as to achieve the simple operation of data mining in the later stage.
What we hear more about is that people's IQ is not very different, and in reality, no one will admit that their IQ is worse than others. However, the academic performance is very different, in fact, people and people's IQ is very different, and IQ is generally affected by genetic factors, which cannot be changed. However, IQ has an impact on the performance, but also need to experiment to compare, confirm the guess. Statistics of grade levels for different IQs is shown in table 4.
Statistics of grade levels for different IQs
IQ level | Under 400 | 400-500 | 500-600 | 600-700 | 700-750 |
---|---|---|---|---|---|
80-90 | 25 | 20 | 5 | 0 | 0 |
90-100 | 50 | 150 | 90 | 10 | 0 |
100-110 | 80 | 195 | 172 | 50 | 3 |
110-120 | 15 | 50 | 80 | 30 | 5 |
120-130 | 0 | 3 | 5 | 10 | 2 |
In Figure 2, 1000 students were randomly selected to first take an IQ test on them to get a general IQ level, and then counted their test scores to build the chart.

Statistical chart of performance levels for different IQs
Then we chose a more important congenital factor to do the statistics. The experiment compares whether the understanding ability is helpful in Chinese learning. In this experiment, 300 students were randomly selected, roughly stratified their understanding, and then made statistics. From the chart, we can see that understanding ability is very helpful to Chinese learning. It is shown in table 5 and Figure 3.
Statistical table of the influence of comprehension ability on language performance
Understand the level of ability | Below 90 | 90-100 | 100-110 | 110-120 | More than 120 |
---|---|---|---|---|---|
1-4 | 20 | 30 | 9 | 1 | 0 |
5-8 | 48 | 100 | 35 | 15 | 2 |
9-10 | 0 | 0 | 7 | 20 | 13 |

Statistical chart of the influence of comprehension ability on Chinese performance
In the big category of educational environment, the family environment and the teacher-student relationship are two major projects.
The first is a comparative experimental analysis of the home environment. In a family environment, a rich family and a poor family environment are unmatched, which may also lead to students' grades being affected, but it is impossible to know whether the impact is good or bad.
According to table 6 and Figure 4, people from poor families generally get slightly worse grades than rich families, but this is also a driving factor to help students improve their grades.
Statistical tables of the impact of the family environment on performance
Under 400 | 400-500 | 500-600 | 600-700 | 700-750 | |
---|---|---|---|---|---|
30 | 100 | 48 | 20 | 2 | |
10 | 32 | 100 | 50 | 8 |

Statistical diagram of the influence of the family environment on performance
Then there is the experimental analysis of the teacher-student relationship. The bad relationship between teachers and students is likely to cause the students to be unwilling to learn or do some bad things, so the teacher's education is also very important, not only in the lectures, but also in other aspects of things, which always affect the students' grades. It is shown in table 7 and Figure 5.
Statistical table of the influence of teacher-student relationship on class performance
The worst result | best total of points | average scores | |
---|---|---|---|
380 | 650 | 560 | |
250 | 580 | 500 |

Statistical diagram of the influence of teacher-student relationship on class performance
Generally, after the end of the exam, we will reflect on ourselves and reflect on their own shortcomings, and this is exactly what I want to say. The interference of the self-factor, perhaps many students think that there is nothing wrong with themselves, but the results always cannot go up, maybe this is caused by the self-factor. And what I said about the self-factor, in fact, is also equivalent to the acquired factor, which belongs to their own to control the factors others can't help you.
The first one is their own efforts, which is a very important factor. As the saying goes, a person has talent is not terrible, terrible is a person has talent and hard work. So hard work is as important as talent, and lack is nothing. It is shown in table 8 and Figure 6.
Statistical table of the importance of effort
Efforts are important | 10 | 15 | 18 | 20 | 20 |
Hard work doesn't matter | 10 | 5 | 2 | 0 | 0 |

Statistical plot of effort importance
This is the students' statistics of the importance of effort, and I can see that most people think that effort is really important, which can also be reflected in the data. Then comes the motivation to learn, which is very rare and difficult to understand. If hard work is the time of study, then the motivation is the efficiency of learning. If we have better efficiency in learning, then our efforts can double, and our results will rise. There are many aspects of motivation, maybe a very small thing or behavior will become motivation. It is shown in table 9 and Figure 6.
Statistical table of the influence of motivation on learning
5 | 10 | 15 | 18 | 20 | |
15 | 10 | 5 | 2 | 0 |

Statistical diagram of the influence of no motivation on learning
Academic performance has always been an essential part of the study career, but also the embodiment of the learning effect, and it is also the most important thing under our national exam-oriented education system. In fact, if you really want to discuss it well, the whole world is divided according to the academic performance, but the difference is just the importance of the performance. The article finished the analysis of the factors affecting the students' performance, intelligence we cannot change, but the environmental factors and psychological factors need us to adjust, control the state of mind, adjust the state, to meet every day of study. I hope we can learn attentively, but also hope that our parents and teachers can pay more attention to each child's education, this is not what students can bear alone, need the efforts of everyone involved, to make students get better.