Quantifying the health benefits of physical activity for groups based on cluster analysis
Pubblicato online: 19 mar 2025
Ricevuto: 23 ott 2024
Accettato: 09 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0467
Parole chiave
© 2025 Huan Huang et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
With the improvement of people’s quality of life and standards, people are more in pursuit of health, both physical health and mental health. Social development has higher requirements for the quality and ability of students, in the fierce competition in the market, if you want to succeed, you need to comprehensively improve the ability and quality of students, mental health can enhance the adaptability of students, so cultivate students’ healthy psychology, which is of great help to the future work and life [1–5]. And the launch of physical exercise activities is to let students have a healthy body, but also to let students have a healthy psychology, enrich the spiritual civilization of students’ life, and help students develop comprehensively. A large number of facts and studies show that physical exercise is one of the most effective ways to promote the development of physical and mental health [6–9]. Since physical activity not only improves the physical quality of college students, but also has a direct effect on their mental health, so, Chinese college sports take enhancing the physical quality of college students and improving their physical and mental health as the main teaching task [10–13]. Accompanied by the deepening of the reform of college sports teaching and the increasing attention of colleges and universities to the mental health education of college students, the relationship between college sports and college students’ mental health mutual promotion is increasingly recognized. From the perspectives of educational psychology and mental hygiene, the comprehensive development of college students is closely related to their mental health [14–16]. Without psychological health, it is difficult to develop morally, intellectually and physically, and in turn, the comprehensive development of morality, intellectuality and physical fitness can effectively promote the development of psychological health. Some scholars have found that the contribution rate of physical education to college students’ mental health reaches 52.15%, so it can be said that there is a very close connection between physical exercise and group health [17–19]. In order to promote the interaction between college sports and college students’ health benefits, we can quantify the status of group health benefits through physical exercise based on cluster analysis, and put forward the specific operational strategies for integrating health education in college sports teaching, so as to provide the basis and reference for deepening the reform of college physical education teaching and improving the group health benefits that can be used as a reference [20–22].
To explore the impact of physical activity on group health benefits, this paper combines the cluster analysis method to propose research hypotheses, and first determines the data source and variable selection. In the process of setting up the research model, the problem of bias due to self-selection was eliminated by propensity score matching, and balance was utilized to test whether the variables selected in this paper were applicable to the regression analysis of group health benefits. Probit regression was conducted using the Bivariate Probit model to calculate the marginal effects of the core explanatory variables, and the possible endogeneity of the explanatory variables was solved with the help of the two-stage least squares model. Using the China Health National Tracking Survey database as the study sample, the baseline regression analysis and robustness test were completed, and cluster analysis was used to classify the group types with different motivations for physical activity, on the basis of which the heterogeneity test was carried out.
This paper focuses on exploring the effects of physical activity on group health benefits and proposes specific hypotheses as follows.
physical exercise can improve group health and enhance self-assessed health. physical exercise has a positive effect on both physical health and mental health of groups, with a greater effect on physical health. physical activity has a positive effect on the physiological health of both active groups and enjoyment groups, while in the level of mental health promotion is obvious for active groups, in the enjoyment group is not obvious. physical activity has a promoting effect on the physiological and psychological health of both the profit-seeking and the depressed groups, while it has a more significant effect on the psychological health of the depressed group.
This paper uses the China Health National Tracking Survey database for empirical analysis. The database contains a wide range of information from socioeconomic status to health status, covers 28 provinces, municipalities and autonomous regions in China, and adopts a stratified multi-stage PPS sampling strategy. As of the most recent national tracking survey in 2023, the sample has covered 21,000 respondents in 13,800 households, which is able to better represent the national sample that includes both urban and rural areas, and the quality of its data is authoritatively representative and widely recognized.
This study uses the research method of quantitative analysis, based on the research purpose of this paper and reference to previous studies, to select appropriate explanatory variables and clean the data. Based on the characteristics of the data, individuals differ in time and region, and it is necessary to control the endogeneity of time and regional economic variables for regression analysis.
The core explanatory variable of this paper is the physical activity status of the group. Physical activity is the process of planned, regular, and repetitive physical activity aimed at developing the body, improving health, and strengthening institutions.
The World Health Organization defines health as a state of physical, mental, and social perfection. Physical Health Overall health status is reflected by self-assessed health-related questions. This paper focuses on the state of whether the group is healthy or not, so very good, good and average health are categorized as physical health and assigned a value of 1, and bad and very bad are categorized as physical ill health and assigned a value of 0. Mental health Mental health status contains two parts: depressive status and cognitive ability. The depressive tendency component was measured using the Depression Self-Rating Scale (CES-D) score from the CHARLS questionnaire.
In order to reduce possible biases in the statistical model due to the omission of variables, three types of variables were controlled for in the empirical analysis: demographic characteristics (age, gender), indicators of socio-economic status (level of education, work status, number of properties in the household), and social support status (marital status, number of cohabitants, health insurance).
This paper draws on the research methods of other scholars to study the effect of physical activity on group health based on propensity score matching.
In order to better eliminate self-selection bias, this paper uses PSM’s nearest-neighbor matching to ensure that the sample individuals are as similar as possible in terms of the observable basic information level by using the “common support” interval, so as to eliminate the potential selfselection bias [23].
The specific idea of using propensity score matching in this paper is to obtain the conditional probability of participating in physical activity, i.e., the “propensity score”, through the Probit model, and then use nearest-neighbor matching for pairing and testing [24]. The focus of this paper is on the extent to which physical activity interventions can bring about improvements in the health of the participants, i.e., the average treatment effect ATT in the model [25]:
There are two intervention states in the model for any individual
Since the explanatory variables and the core explanatory variables used in this paper are based on the CHARLS questionnaire about whether or not to choose a class, the regression model involves a large number of discrete variables and is not suitable for traditional continuous multiple regression. Therefore, the more appropriate discrete Bivariate Probit regression model will be used in this paper. In this paper, the Bivariate Probit model was used to analyze the effect of performing physical activity on group health benefits [26]. Generally usually used linear probability model formula is:
In this model,
Where
Since the core explanatory variable of this paper, whether or not one engages in physical activity, is likely to have an endogeneity problem that could easily lead to biased and non-consistent estimates, it needs to be handled with caution. The main explanatory variable in this paper may have an endogeneity problem.
In order to address the problem, this paper will estimate the effect of physical activity on group health benefits through the instrumental variable method. The instrumental variables used in this paper are exogenous variables that are correlated with the presence or absence of physical activity (exercise in this study) but cannot be correlated with the disturbance term, i.e., the instrumental variables need to fulfill the following two basic conditions: Correlation. The instrumental variable should be correlated with the potential endogenous variable Exogeneity. The instrumental variables should be independent of the omitted variables:
In this paper, the coverage rate of physical activity is finally chosen as the instrumental variable, see equation (8). This variable is a value calculated based on the proportion of people who are physically active in the province where the insured individual
Since both the explanatory variable of interest in this paper (whether or not one has been physically active in the past year) and the potential endogenous variable (exercise) are binary variables. Relevant econometric literature suggests that when the explanatory variables are mostly binary dummy variables, the multivariate regression model for continuous variables will no longer be applicable, whereas the Bivariate Probit regression model requires that the explanatory variables need to be binary dummy variables, and the regression results will have a better fit. Therefore, for the dichotomous explanatory variables selected in this paper, the Bivariate Probit regression model is used to realize the estimation and analysis of instrumental variables. The model is structured in the following form:
The results of the ordered Logistic regression where physical activity has an impact on group health benefits are shown in Table 1. Model 1 and Model 2 are baseline models that primarily examine the effect of the primary independent variable (physical activity) on the dependent variable (group health benefits). Models 3 and 4 respectively add control variables on this basis, and model 5 takes replacement variables to test the robustness of the regression results, and it can be seen that the model as a whole is significant and statistically significant. Models 1 and 2 are baseline models that present the relationship between physical activity and the level of physical and mental health of the group when no control variables are added. The regression results show that the regression coefficients of physical exercise on group physical health are 1.021 and on mental health are 0.765 and both are significant at the 1% level, indicating that a significant positive correlation is presented between the independent variable, physical exercise, and the dependent variable, group physical and mental health. Models 3 and 4 incorporate control variables to investigate the effects of physical exercise on both physical and mental health. It can be seen that the regression coefficient of physical exercise on physiological health is 0.621, and the regression coefficient on mental health is 0.352, and the result is still significant at the 1% level, indicating that there is still a significant positive correlation between the independent variable of physical exercise and the dependent variable of group physical and mental health, and that physical exercise has a promotional effect on the group’s physiological and mental health, and that the promotional effect on physiological health is better than the effect on mental health. In order to check the robustness of the results, model 5 adopts the method of replacing variables on the basis of the previous benchmark regression, replacing the dependent variable with individual self-assessed health, and examining whether the relationship between physical activity and group health is still significant. As can be seen from the results in the table, the estimated coefficient of physical exercise on group self-assessed health is 0.522, and it is significant at the 1% level, which indicates that physical exercise can also improve the level of group self-assessed health, which can prove that the regression results of the previous paper have credibility. Hypothesis 1 is also verified.
Benchmark regression analysis
Model1 | Model2 | Model3 | Model4 | Model5 | |
---|---|---|---|---|---|
Variable | Physiological health | Mental health | Physiological health | Mental health | self-assessment |
Physical exercise | 1.021*** | 0.765*** | 0.621*** | 0.352*** | 0.522*** |
(0.075) | (0.075) | (0.083) | (0.084) | (0.082) | |
Age | - | - | -0.007* | 0.027*** | 0.008 |
- | - | (0.006) | (0.006) | (0.006) | |
Gender | - | - | 0.155** | 0.116* | 0.165** |
- | - | (0.065) | (0.066) | (0.065) | |
Education | - | - | 0.231*** | 0.243*** | 0.187*** |
- | - | (0.025) | (0.026) | (0.025) | |
Marital status | - | - | 0.086 | 0.266*** | 0.09+G15 |
- | - | (0.077) | (0.078) | (0.078) | |
Working condition | - | - | 0.158** | -0.134* | 0.097 |
- | - | (0.073) | (0.073) | (0.074) | |
Homing number | - | - | 0.015 | -0.015 | -0.025 |
- | - | (0.023) | (0.023) | (0.023) | |
Medical insurance | - | - | -0.217* | 0.156 | -0.057 |
- | - | (0.122) | (0.122) | (0.123) | |
Family number | - | - | 0.217*** | 0.255*** | 0.213*** |
- | - | (0.053) | (0.053) | (0.051) | |
Observed value | 3519 | 3519 | 3519 | 3519 | 3519 |
Ps-R2 | 0.017 | 0.013 | 0.035 | 0.031 | 0.026 |
Note: in the brackets of the table, the standard error, the respectively said that the 1 percent level of the 10%is significant.
Simply using regression analysis, the estimates obtained still have the potential for bias. In this section, the PSM method is used to effectively address the endogeneity problem. In this paper, three types of matching methods, namely, nearest neighbor matching, radius matching and kernel matching, are used for validation, and the ATT effects obtained from the three types of matching methods are shown in Table 2.
ATT
- | ATT | Physiological health | T | ATT | Mental health | T |
---|---|---|---|---|---|---|
Standard error | Standard error | |||||
Before matching | 0.66 | 0.045 | 13.93 | 0.407 | 0.06 | 10.12 |
After matching | - | - | - | - | - | - |
Recent adjacent matching | 0.492 | 0.051 | 5.29 | 0.224 | 0.168 | 3.17 |
Radius matching | 0.301 | 0.074 | 5.55 | 0.089 | 0.074 | 2.9 |
Nuclear matching | 0.28 | 0.044 | 5.55 | 0.139 | 0.044 | 3.01 |
Before matching, the average treatment effects (ATT) for physical and mental health of the group were 0.66 and 0.407, respectively, and after controlling for selective bias, the average treatment effects (ATT) for physical health of the three methods of nearest-neighbor, radius, and kernel matching were obtained to be 0.492, 0.301, and 0.28, respectively, with a mean value of 0.358, and the three methods of matching were 0.358 for mental health average treatment effects (ATT) were and 0.224, 0.089, and 0.139, respectively, with a mean value of 0.151. The above data show that, controlling for the same interfering variables, physical exercise increases the group’s physiological health by an average of 0.358 units and mental health by an average of 0.151 units, and the group’s behavior of engaging in physical exercise still promotes their physiological and mental health significantly, and it is the promotion of physiological health that is more effective. Hypothesis 2 was verified. This also shows that if the problem of selective bias in the benchmark model is not dealt with, it will lead us to overestimate the promotion of physical activity on the physiological and psychological health of the group.
In order to ensure the accuracy of the matching results, this paper carried out the matching effect test, and the test results are specifically shown in Table 3. In statistics, the standardized deviation (%bias) of the covariates after matching is generally required to be less than 10%, and the balanced results in the table show that the standardized deviation of the covariates after matching is between 0.5% and 5.5%, which are all less than 10%, which indicates that the balanced test has been passed, and the model estimation can be supported. Therefore, the regression of physical activity and group health benefits is robust and convincing.
Test of matching effect
Variable | Treated | Control | %Bias | T | P>T |
---|---|---|---|---|---|
Age | 66.897 | 66.862 | 0.5 | 0.11 | 0.83 |
Gender | 0.516 | 0.51 | 1.8 | 0.48 | 0.665 |
Marriage | 0.831 | 0.836 | 0.5 | 0.09 | 0.826 |
Education | 2.917 | 2.964 | -4.6 | -1.44 | 0.146 |
Hukou | 0.204 | 0.248 | -5.5 | -1.66 | 0.065 |
Endowment insurance | 0.216 | 0.22 | -2.5 | -0.77 | 0.348 |
ADL | 0.037 | 0.027 | 0 | 0 | 1 |
IADL | 0.15 | 0.18 | -2.7 | -0.79 | 0.452 |
Living mode | 0.941 | 0.929 | 1.3 | 0.35 | 0.706 |
Community facilities | 0.87 | 0.874 | 1.8 | 0.72 | 0.502 |
Chronic disease | 0.749 | 0.781 | -3 | -0.81 | 0.448 |
Physical pain | 0.594 | 0.594 | 1.5 | 0.39 | 0.635 |
Life satisfaction | 3.982 | 3.945 | 4.2 | 1.26 | 0.223 |
In order to understand the internal and external motivation of the group to engage in physical activity, the sample data was divided into a strong external motivation group and a weak external motivation group. The strength of internal motivation and external motivation were grouped as clustering variables, and the second-order clustering analysis was carried out by the software “SPSS26.0”, and the optimal number of classifications was judged according to the Schwartz Bayes criterion (BIC), and finally the optimal number of clusters was automatically determined by the software “SPSS” to be 4, and the measurement value of the cluster profile was 1.0, and the clustering effect was good. According to the distribution of the number of clusters and the cluster summary of the cluster analysis, the final clusters were clustered into four combination types of strong external motivation and weak internal motivation, strong external motivation and strong internal motivation, weak external motivation and weak internal motivation, and weak external motivation and strong internal motivation, which were named profit-seeking, active, depressed and enjoyment type respectively. The sports behaviors of groups with different motivation types were compared, and the comparison is shown in Figure 1. In the figure, I1~I4 represent the physical activity behavior indicators such as physical activity time, physical activity frequency, physical activity intensity, and the number of physical activity items, while T1~T4 represent the active, enjoyable, profit-seeking, and depressed groups, respectively. As can be seen from the figure, there are differences in physical activity time (P < 0.001), physical activity frequency (P < 0.001), physical activity intensity (P < 0.05), and number of physical activity programs (P < 0.001) among groups with different motivation types. Significant differences in physical activity behavior appeared when the active group and the enjoyment group were compared to the demoralized group, respectively. In contrast, when the profit-seeking group was compared with the demoralized group, significant differences were found only in the amount of time spent in physical activity. Specifically, the depressed group performed the worst in physical activity behavior, the profit-seeking group performed slightly better than the depressed group, the enjoyment group and the active group performed significantly better than the depressed group, but there was no significant difference between the enjoyment group and the active group in terms of the four physical activity behavior indicators.

Comparison of physical exercise behavior
The indicator situation of different motivation type groups in physical exercise misbehavior is compared, as shown in Figure 2. In the figure, B1~B6 represent six indicators of misbehavior in physical education class, misbehavior in physical fitness standard test, misbehavior in extracurricular competitions, misbehavior in club activities, misbehavior in independent exercise, and the total score of misbehavior in physical education, respectively. As can be seen from the figure, there are significant differences between groups with different motivation types in the misbehavior of physical education class (p<0.001), misbehavior of physical fitness standard test (p<0.01), misbehavior of extracurricular competitions (p<0.001), misbehavior of social club activities (p<0.05), nonsignificant difference in misbehavior of independent exercise (p>0.05), and total score of sports misbehavior There is a significant difference (p<0.01). In terms of the total score of sports misbehavior, the profit-seeking group has the highest number of sports misbehavior, the depressed group has the second highest number of sports misbehavior, the enjoyment group has the third highest number of sports misbehavior, and the active group has the lowest number of sports misbehavior.

Comparison of sports misconduct
In the previous section, this paper classified the groups in the sample data into four motivational types, namely profit-seeking, active, depressed and enjoyment, by means of cluster analysis, and conducted comparative analyses of physical activity behaviors and misconducts of groups with different motivational types. This section will further discuss the heterogeneity of the effects of physical activity on the physiological and psychological health of groups with different motivation types. Table 4 specifically displays the results of the heterogeneity test for groups based on different motivation types.
Results of heterogeneity testing
- | - | Type of motivation | |||
---|---|---|---|---|---|
Active type | Enjoy type | Profit-cutting type | Depression type | ||
Physiological health | Physical exercise | 0.715*** | 0.362** | 0.466*** | 0.745*** |
-6.92 | -2.47 | -3.92 | -6.51 | ||
bdiff | - | 0.351** | - | 0.351** | |
- | -0.019 | - | -0.055 | ||
Control variable | Control | Control | Control | Control | |
Mental health | Physical exercise | 0.384*** | 0.218 | 0.228* | 0.435*** |
-3.69 | -1.35 | -1.83 | -3.82 | ||
bdiff | - | 0.182 | - | -0.21 | |
-0.146 | - | -0.1 | |||
Control variable | Control | Control | Control | Control |
It can be seen that the effect of physical activity on the physiological health of the group is significant at the 1% level for the active group, with an estimated coefficient of 0.715, and at the 5% level for the enjoyment group, with an estimated coefficient of 0.362, both of which have a promoting effect. The estimated coefficient of the effect effect of physical activity on the mental health of the active group is 0.384, which is significant at the 1% level, and the estimated coefficient of the effect effect effect on the mental health of the active group is 0.218, but its effect effect is not significant, which means that physical activity has a promotional effect on the level of mental health of the active group only, and the effect on the enjoyment group is not significant. This confirms the hypothesis of the research design part of this paper regarding the heterogeneity of the active and enjoyment groups, and hypothesis 3 has been verified. Compared to the enjoyment group, physical activity has a stronger promotional effect on the physical and mental health of the active group.
The estimated coefficient of the effect of physical exercise on the physiological health of the profitseeking group is 0.466, and the estimated coefficient of the effect of physical exercise on the health of the depressed group is 0.745, both of which are significant at the 1% level, indicating that there is a promotion effect of physical exercise on the physiological health of both profit-seeking and depressed groups. In addition, both profit-seeking and depressed groups can promote mental health. The estimated coefficient of the depressed group is larger, 0.435, which is stronger than that of the profit-seeking group, and hypothesis 4 is confirmed.
This paper takes the impact of physical activity on group health benefits as the research content, determines the research hypotheses with cluster analysis as an auxiliary means, completes the determination of data sources, research methods and variable selection, and sets up the research model of this paper by combining the Bivariate Probit model and the two-stage least squares model. In the analysis of the impact of physical activity on group health benefits using the China Health National Tracking Survey Database as the research sample, the regression coefficients of physical activity on physical and mental health of the group on the baseline model were obtained to be 1.021 and 0.765 respectively, which were significant and showed a significant positive correlation at the 1% level. And after adding control variables, the regression results of physical activity on physical and mental health remained significant at the 1% level and maintained a significant positive correlation. Replacing the individual’s self-assessed health as the dependent variable, the estimated coefficient of physical activity on group self-assessed health is 0.522 and is significant at 1% level, proving the credibility of the regression results. Controlling for the same interfering variables, physical exercise increased the group’s physical and mental health by an average of 0.358 and 0.151 units, and the standardized deviation of the covariates after matching was between 0.5% and 5.5%, which was less than 10%, passing the balance and robustness test. The group samples were divided into four different motivation groups, namely profit-seeking, active, depressed, and enjoyment, by cluster analysis, and the heterogeneity test was carried out on this basis. Physical exercise is significant at the 5% level for the physiological health of the active and enjoyment groups, and has a promotional effect, while the effect of physical exercise on the mental health of the active group is not significant in terms of mental health. In the profit-oriented group and the depressed group, physical activity has a positive promotional effect.