Applied Mathematics and Nonlinear Sciences

This article uses the theory of probability and statistics to evaluate the thinking dynamics of college students to understand the psychological state of college students. First, the paper uses a web crawler to crawl and analyze the official micro articles. Using the method of probability statistics and the K-mean clustering method, we can understand the psychological state of college students in real time. The results of this experiment show that the current hot topics can be obtained within a certain period by using the statistical method of vocabulary display and clustering. The purpose of this paper is to propose corresponding countermeasures and approaches for the ideological and political work of college graduates. This model has a positive effect on cultivating college students' values and ways of thinking.


Introduction
By evaluating the mental health status of colleges and universities, the role of colleges and college counselors in student work can be promoted. Among the college graduates, post-1995 is the main body. They are active, open, curious, fragile in heart, and easy to control their state of mind. Traditional ideological and political teaching is mainly carried out in classrooms [1]. This teaching method especially imparts ideological and political theories but cannot grasp the psychological activities of students. This paper establishes a new mathematical probability and statistics model for mental health evaluation in colleges and universities. The purpose is to enable colleges and college counselors to understand the current state of mental health in colleges and universities and to benefit the teaching work in colleges and universities.
This paper aims to strengthen the ideological and political work of college graduates. Articles capture articles published by college students on Weibo. Secondly, combined with technical methods such as sentiment analysis, feature word extraction, and statistical clustering analysis, college students' thinking motivation and blog content were established.

Technical policy
The technical means studied in this thesis are acquiring blogs, topic clusters, etc. This paper uses a crawler-based algorithm to obtain microblog information. Official microblogs such as student association microblogs, university microblogs, college microblogs, and class microblogs are all critical components of official microblogs. This type of user has a more substantial social influence and communication effect. It then uses its technique to classify the topics to get a taxonomy of issues [2]. According to the type of topic, students can grasp the state of thinking and changes in thinking. Articles are classified by keywords extracted from issues. Using this method can accurately capture the problems that students are concerned about and then understand their psychological state.

Hierarchical Cluster Mode
Clustering is an unsupervised-based machine learning algorithm that detects associations between clusters and evaluates data similarity within collections. This paper uses the hierarchical clustering method to cluster and analyze big university data. Based on the clustering method, a portrait of college students' ideological and political literacy is established [3]. The entire schema is shown in Figure 1. The data of college students include the characteristics of types such as gender and place of origin and the characteristics of numbers such as age and sleep time. Since the teaching data used contains many elements, using a single method of probability and statistics often leads to unsatisfactory classification results.
Hierarchical clustering is a classification method for categorical data for classification. This method is an extension of the K-means method. According to the basic idea of K-means, it is modified. It improves the nature of the category and the operation of cluster centers. A numerical calculation method of classification attributes based on classification features is proposed [4]. Therefore, there must be a method to classify two data simultaneously. Based on retaining the K-means and hierarchical clustering methods, this method adds a calculation method for the degree of difference between samples representing clusters and data with mixed attributes. This method is a representative hybrid feature clustering method.

Ranging method
Numeral types were measured using K-means and distance calculations by hierarchical clustering. In this way, they can be combined to form the distance from the model. Assume there are samples and characteristic groups representing two samples.
Numerical features are normalized and transformed into [0, 1] segments, which are then solved by the spacing of the numeric elements [5]. The Euclidean distance used in the K-average method is obtained by calculating the interval between two Euclidean fillets. This distance is expressed in the following way: (1) Hierarchical clustering algorithms use Hamming distance to compute categorical features: When is . When is . and are numeric properties. and are the characteristics of the classification. and are the number of numerical features and categorical features.
Hybrid properties can combine various properties into a single-phase transition matrix. Let be the number of clusters. represents the classification center selected by classification . The distance between the data and the central collection can be expressed in the following way: Then the loss function of hierarchical clustering can be defined as: represents the overall loss of all digital characteristics in class . represents the overall loss of all types. has some influence on the accuracy of clustering. It only performs calculations for numeric properties. As increases, the proportion of type attributes increases [6]. Categorical variables have the largest proportion in the cluster. Picking an appropriate can make the collection better. The choice of is related to the mean square error of each numerical variable. If the mean square error is set to 1, the value is preferably set between 0.5 and 0.7.

Hybrid Clustering Hierarchical Clustering
Normalizes a number with a bias of 1. ! is 0.5. A specific procedure for the hierarchical clustering s-algorithm: Step 1: Randomly select the initial cluster { " , # , ⋯ , $ } of Group from Group ; Step 2: traverse the data collection A. We operate on the positions of samples from their respective cluster centers according to Equation (3) and assign them to the class closest to the center; Step 3: After each data is assigned, it will be classified according to the typical characteristics. Use the method of (1) to solve this digital characteristic. Use formula (2) to find the classification characteristics; Step 4: Use equations (3) and (4) to find the loss equation; Step 5: When the value of the new loss function is lower than the set critical value or the repeated value is more than set A, the operation will be terminated [7]. Currently, the system regards the cluster center as a data set. If not, repeat step 2,3,4.

Analysis of cluster characteristics and portrait construction
The number of clusters selected is related to the statistical properties of the collections and the portrait conditions. The form factor blends cohesion with separation. Based on aggregates, the cores are described in detail. And the average variance of this group was calculated [8]. The article evaluates it from three aspects: essential attributes, life rules, and daily consumption. In clustering, a more detailed subdivision is carried out according to the average score of the course, which makes the image construction of the cluster more complete and clear.

Data preprocessing
This study used data on anaphylaxis among undergraduates at a university. It includes static information such as basic information about college students and student status information [9]. The data consists of 2021-2022 academic performance, borrowed books, living consumption, reward and punishment loans, Internet access records, access control records, etc.
Studies have shown that this method has a significant effect on the forecast results of the model. Before performing model analysis, it must be preprocessed in advance.
This thesis mainly deletes a general user record and incomplete online information, achievement information and samples lacking basic information. Digital data is measured by the Euclidean distance [10]. Among them, the dimensions of indicators such as the weighted scores students must master and the average number of books borrowed per month vary greatly. On this basis, a max-min normalization method is proposed to reduce the influence of various scales on clustering. The numerical properties of each sample were normalized using the following equation.
represents the normalized value of this characteristic. is an unnormalized value.
represents the maximum number of samples for this feature. represents the minimum number for this feature.

Data description
Students' basic information, consumption, and educational affairs information are analyzed. The basic knowledge of the students mainly includes the students' background information, the development stage of party members, etc. The consumer's consumption information is integrated with the users of the one-card in different periods through the one-card and IP login account [11]. The teaching data in the teaching management system includes the number of failed subjects and the average score of compulsory courses. See the following table for the classification and numerical characteristics of the 2020 preparatory students. Table 1 includes 27 characteristics, including the basic principles of Marxism, the primary content and methods of socialist ideological and political education with Chinese characteristics [12]. Due to the limited means of data collection, this study only investigated college students based on campus credit cards.

Cluster evaluation
The number K of different classmate group partitions is determined using the equivalence factor. It blends the degree of aggregation with the degree of segregation. These data were used to evaluate the effectiveness of clustering. The following equation determines the contour factor: represents the average distance between the i sample and all other data in the same cluster. This is the degree within the quantified cluster. represents its average distance to the nearest cluster. It is used to quantify the degree of separation within clusters. is the average value of the silhouette coefficient of all samples. If is less than 0, it means that the average distance between and its elements in the cluster is larger than the nearest other clusters. The result indicates that the clustering effect is not good. If tends to 0, or is greater than , then s tends to 1. This shows that the clustering effect is the best.
represents the average distance between the i-th sample and other data in its similar cluster. This is the number in the collection. represents the average distance between it and the nearest cluster. It is used to quantify the degree of separation in clusters. is the average of the contour factors for all samples. A value of below 0 means that and the elements in its group are farther apart than the rest. Studies have shown that this method has poor clustering performance. When D is close to 0 and is more significant than A, s is close to 1, indicating the most efficient manner. This study uses a clustering method to count the 2020 graduates. Firstly, the identical factor is used to study the effect of hierarchical clustering. Combine this algorithm with mean-mean and mean-shift clustering. The results of the comparison are shown in Figure 2.

Figure 2. Comparison of Silhouette Coefficients
It can be seen from Figure 2 that the algorithm proposed in this paper is much better than the general K-means algorithm in classification. The advantages of hierarchical clustering can be fully utilized when dealing with data containing both numeric and type attributes [13]. The cluster performance is best when the number of clusters K is 4. Groups A, B, C and D were clustered using a hierarchical clustering algorithm with K=4. Table 2 is obtained through the statistics of the proportion of students and the weight of compulsory courses. Students who excel academically in these indicators have a weighted score of over 85 in elective subjects for each year. In contrast, lower achievers have a weighted average of less than 60 in elective subjects for the semester. In this paper, different levels of investigations have been carried out on different types of students. The study found that the proportion of students in various categories, the balance of outstanding and low-class students, is different.

Analysis of students' ideological and political education portraits based on data mining
In this context, the hierarchical clustering method is used to study group portraits of college students. The hierarchical clustering method was used to conduct a cluster analysis on the data of the 2020 graduates with K=4. According to the classification method, the ideological and political literacy map of college students is established according to the classification and the number (Table 3-Table 6). The characteristics of numeric types are presented in a table. The article describes it from three aspects: essential attributes, daily consumption and life rules.
According to the cluster analysis method, four portraits of college students were established. Additionally, the boys-to-girls ratio represents the number of boys to girls in the student body. The balance of intention to join the party indicates the percentage of applicants for joining the party, active members of the party and development targets in the total number of party members. The award rate is a percentage of the total number of students awarded scholarships. The difficulty ratio refers to the percentage of family members who have difficulties overall and special ones. The means and variances of student clusters were compared and analyzed. And compared with the overall digital characteristics of the 2020 graduates in Table 2. This way, the characteristics of different types of students can be analyzed and defined. The evaluation criteria are based on factors such as the average score of ideological and political courses and the desire of party members to join the party.  Table 5 shows the 197 students in group B. The weight score of their compulsory courses is 76.16, which is the worst among the four groups; among the four groups, the scores of ideology, politics and physical education are the lowest (15.44).
Among them, students with poor academic performance accounted for 15.88%. Among the four groups, boys in group B are the most. Compared with other college students, users of leisure application software are more prominent, and the time spent on online games is also. In this study, the students in Group B were classified as "night owls." Judging from the characteristics of the above portraits, the proportion of "night owls" is very high. They place less emphasis on learning. The student's family background is relatively good, their network expenses are rather large, their network time is pretty long, and their living habits are not fixed. It can be seen that among college students, the Internet has negatively affected them. In this regard, these students can be included as a priority. The school should warn students about learning and hold courses such as college and career planning. In this way, students' good study and lifestyle will be cultivated.
There are 1458 students in Group C in Table 5, and the weighted average score of their compulsory courses is 80.23. In the high school stage, the academic performance in ideology, politics and physical education is above average. The proportion of students with good academic performance is relatively large, and the balance of students with poor academic performance is relatively small. They have fixed work and rest arrangements. Eat the most breakfast and spend the shortest time in the evening. Students have good living habits. The students in Group C belong to the "regular lifestyle." Group D, shown in Table 6, has a total of 1259 students, and the weighted average of its introductory courses is 85.27. In each category, students with excellent performance accounted for a large proportion. Among them, ideological and political courses and physical education courses are the best, and the rate of receiving scholarships is the largest. Among the students in this group, there are more significant family problems. The ratio is 0.35. Therefore, students in category D can be classified as "thrifty and thrifty."

Conclusion
This paper classifies college students using the composite measure of Hamming distance and European-American distance. Then the edge factor is used to evaluate the effectiveness of clustering. Finally, it is compared with K-means and mean-shift clustering algorithms. The research results show that teaching data processing using the hierarchical clustering method has certain advantages.