The Use of Cluster Analysis for Non-Continuous Variables in the Assessment of Dietary Behaviours and Physical Activities in Primary School Children

Abstract Physical activity, along with proper nutrition, is a very important element in child development. Lack of everyday, regular physical activity among young people is a public health problem. The aim of the study was to use cluster analysis to assess the relationship between nutrition and physical activity levels of primary school children. The study included 682 students from randomly selected elementary schools and was performed using a proprietary questionnaire during the 2013/2014 school year. The questionnaire contained questions about eating habits and physical activity, as well as the socio-economic conditions of families. Clusters of students of similar dietary habits were identified using cluster analysis and subsequently compared in terms of physical activity level. We identified four clusters, characterized by relative internal homogeneity and at the same time variability between one another in terms of number of meals throughout the day and time of their consumption. The most important characteristic of Cluster 1 was eating four meals a day including breakfast, which is the most important meal of the day. The diets of children in Cluster 2 abounded with raw vegetables and fruits. Students in Cluster 3 were characterized by a regular and varied diet. The least appropriate behaviour in the field of nutrition was observed among students belonging to Cluster 4. Cluster analysis in the studied population allowed relationships between dietary habits and physical activity to be described. By using the UIAF indicator (Moderate to Intense Physical Activity), a statistically significant association between the eating habits of the children and their physical activity levels was observed. A sufficient level of physical activity was observed in most students belonging to Cluster 3, and high levels of physical activity were observed in a small percentage of children belonging Cluster 4. An average level of physical activity was observed in a high percentage of children belonging to Cluster 4. Low levels of physical activity were most frequently observed in Clusters 4 and 1 and least frequently observed in Cluster 3. All of the identified active forms of free-time activity were most commonly acknowledged in Cluster 3. The study supports a beneficial relationship between students’ eating behaviours and physical activity.


Introduction
During childhood, unbalanced diet and physical activity deficiencies may interfere in the process of normal growth and maturation, and can also lead to developmental disorders, weight disorders, and related emotional difficulties (Graf et al., 2014;Hinkley et al., 2012;Leech et al., 2014;Pilch et al., 2011;Ward et al., 2010). Civilization and technological progress have decreased the levels of physical activity in children and adolescents in relation to their biological needs. Decreases in children's activity levels include the following: limiting spontaneous physical activity for fun, sedentary games, easy access to a TV and computer, and too often, unnecessary use of modes of transport, such as car and bus. (Hills et al., 2015;Nettlefold et al., 2011;Rey-López et al., 2010).
Various studies have suggested a number of shortcomings in terms of physical activity in young people that can lead to many health problems, and the universality of their occurrence requires work in the field of public health (Hills et al., 2011;Jodkowska et al., 2015;Stamatakis et al., 2015). Problems associated with lifestyle-related diseases are crucial for most developed countries. According to the World Health Organization, sedentary behaviour is now considered as a direct or indirect cause of death, especially in developed countries (WHO, 2010). Using DALY (disability adjusted lifeyears), it has been estimated that the health of over 8.3 million people in the WHO European Region deteriorates each year due to less physical activity, and about 1 million deaths per year are directly related to lack of physical activity (Lee et al., 2012).
Human eating behaviour is a multidimensional concept and contains an entire range of different components, such as number of meals and their regularity, food preparation methods, snacking, the presence of dietspecific products, and variety of dishes. These factors create a set of interrelations, depending on the profile of a person's eating behaviour (Jarosz et al., 2011;Wojtyła-Buciora et al., 2015). This leads to methodological difficulties in the study of this phenomenon, as analysis of individual components, in isolation from others, limits interpretation and narrows the context of the analysis.
The aim of the study was to use cluster analysis in the assessment of the relationship between nutrition and physical activity levels of children attending primary school.

Methodology of the Study
The study involved 682 students aged 9-12 from randomly selected elementary schools in the Podlasie region, and was performed using a proprietary questionnaire during the 2013/2014 school year. Girls accounted for 54.1% (369 pupils), while boys made up 45.9% (313 pupils) of the population. Children from the city accounted for 75.1% and children from the countryside 24.9%.
The questionnaire typically contained questions about dietary practices, including regular diet (e.g., number of meals and snacking), frequency of consumption of selected groups of food products and dishes (e.g., fruits, vegetables, sweets, and fast food), and questions about physical activities and leisure activities. Selected socioeconomic factors, including the number of people in the household, family status, parents' education, and place of residence were also evaluated.
The percentage of parents who had completed only primary education was very low. Only 11.1% of mothers and 12.2% of fathers had primary education as their highest level. The percentage of mothers with vocational education as their highest level was 16.0% and that of fathers was 35.0%. Nearly 42.5% of women and 35.2% of men had concluded their formal education with secondary education and 30.4% of women and 17.6% of men had completed higher education. Social and living conditions of families were evaluated as "very good" by 30.2% of respondents, as "good" by 68.2% and as "bad" by 1.6%. Financial situation was indicated to be "very good" in 18.0%, "good" in 48.7%, "average" in 30.6%, "poor" in 2.0%, and "very poor" in 0.4% of families.
Physical activity level was evaluated using the UIAF (Moderate to Intense Physical Activity) indicator (average number of days per week, in which subjects allocated at least 60 minutes to physical activity) (Prochaska et al., 2001). A minimum number of 5 days indicates that the amount of physical activity engaged in is sufficient to meet the needs of young people. This index is considered to be average when the average number of days with sufficient physical activity ranges from 3 to 4.5 days and low when the range is between 0 and 2.5 days. The questions concerned how much time had been spent on moderate to high-intensity physical activity during the previous week. Moderate activity was defined as activity that increases heart rate, such as brisk walking, dancing, and gardening. Vigorous activity was defined as activity that significantly increases heart and breathing rates, such as jogging or cycling uphill.
Prior consent was obtained from the management of each institution before the study began. Before the test, parents/guardians were informed about the goals and methods of the study. Consent was also obtained from the Bioethics Committee of the Medical University of Bialystok. Participation in the survey was anonymous.
In order to determine statistically significant differences between distinguished parameters in subgroups of children, Pearson's chi-square independence test was used. The level of significance in all calculations was indicated by p < 0.05.
Statistical analysis included cluster analysis to identify relatively homogenous case groups, characterized by similar dietary habits. Clusterization was conducted with implementation of a TwoStep Cluster Analysis. This method allows one to use categorical variables, unlike the widely used k-means method, which is restricted to continuous variables (Banfield et al., 1993;Huang, 1998). Additionally, the TwoStep cluster analysis method used in this paper automatically determines the number of resulting clusters, which is again advantageous over the previously mentioned k-means method, which requires a priori decision in that matter (Fraley et al., 1998). Clustering was based on dietary habits described by a number of variables, including the number of meals eaten per day, regularity of meal consumption, habits of eating at school, tendency to snack between meals, and presence of a variety of foods and dishes in the diet. The specific characteristics of outcome clusters were presented using descriptive statistics.
Extracted clusters were compared for a number of categorical and ordinal variables. Statistical associations were evaluated using Pearson's χ 2 test. Additionally, all categories were compared in pairs with tests for two proportions adjusted for multiple comparisons with Bonferroni correction. The generalized estimating equations model was used to investigate the relationship between the number of days spent on physical activity and dietary clusters, accounting for clustering of data at the school level. Models were estimated assuming an exchangeable structure of the working correlation matrix. The covariance matrix was estimated using the Huber-White sandwich estimator. The model assumed a linear relationship between response variables and predictors (normal distribution with identity link function). Statistical hypotheses were verified at the 0.05 significance level. All calculations were performed using IBM ® SPSS ® Statistics for Windows, Version 20.0 -IBM Corp., Armonk, NY, USA.

Results
Four clusters were identified, characterized by a relative internal homogeneity and at the same time variability between one another with regard to the selected behaviour in terms of diet (Table 1). Cluster 1 was characterized by eating four meals a day. Students belonging to this cluster most commonly consumed first breakfast. Compared to other clusters, students from Cluster 1 were least likely to eat between meals and ate fewer sweets. Cluster 2 students ate five meals a day. They ate their first breakfasts and ate between meals. The diet of pupils belonging to this cluster most often (compared to the other groups of surveyed children) contained raw vegetables and fruits. Cluster 3 was characterized by eating five meals a day at fixed times. The students of this cluster most commonly ate school meals. All students of this cluster ate snacks between meals. Fruits, sweets, and fast-food dishes were included in their daily diets. Cluster 4 students ate four meals a day. Students belonging to this cluster ate on an irregular basis. Their daily diets demonstrated a high intake of sweets and low intake of fruits and vegetables.
In summary, the most important characteristics observed for separate clusters were the beneficial nutritional patterns found in Clusters 1, 2, and 3. Cluster 1 was characterized by eating four meals a day including breakfast, which is the most important meal of the day. Students found in Cluster 2 ate raw vegetables and fruits. Cluster 3 was characterized by regular eating patterns (five meals a day) and the diet included vegetables and fruits as well as sweets and fast-food. Cluster 4 was considered to be erroneous both in terms of eating pattern and in terms of nutrition; that is, the pupils in this cluster had a diet that lacked vegetables and fruits.
These clusters were subsequently compared regarding physical activity level to determine the relationship between diet and physical activity level. A statistically significant (p = 0.004) association between the children's eating habits and physical activity levels, according to the UIAF indicator, was observed (Table 2). A sufficient level of physical activity was observed in most pupils belonging to Cluster 3 (39.0%). This percentage was 28.1% in Cluster 1 and 26.7% in Cluster 2. Only 13.7% of children were found with high levels of physical activity in Cluster 4, a percentage which was significantly lower compared to Cluster 1 (p = 0.035) and Cluster 3 (p = 0.001). An average level of physical activity was found in 69.5% of the children in Cluster 2, in 59.4% in Cluster 1, and in 56.1% in Cluster 3. According to UIAF ratings, low levels of physical activity were most frequently observed in Clusters 4 (16.8%) and 1 (12.5%). This level of physical activity occurred in 4.9% of the students in Cluster 3. No significant differences regarding medium and low levels of physical activity linked to eating habits were observed between pupils belonging to different clusters. Significant differences in terms of physical activity were observed in students who engaged in physical education (PE) at school, p < 0.001 (Table 3). Cluster 3 featured the highest percentage of students who always participated in PE classes at school (92.7%), with the percentage of students in Cluster 1 who always attended PE not far behind (91.1%). Nearly 87.5% of students from Cluster 2 always took part PE classes. In Cluster 4, 69.8% of children took part in PE classes, a percentage which was significantly lower compared to Cluster 1 (p < 0.001), Cluster 2 (p < 0.001), and Cluster 3 (p = 0.001). The highest percentage (26.0%) of students who participated in most, but not all, lessons was in Cluster 4, with participation being more frequent compared to Cluster 1 (p < 0.001), Cluster 2 (p = 0.002), and Cluster 3 (p = 0.002). Similarly, a lack of regularity in attendance was observed in Cluster 4. There were no significant differences in the use of exemptions from participation in PE classes between clusters. Significant differences were observed between the clusters in terms of leisure time (Table 4), which included outdoor activities (p = 0.012), walks (p = 0.005), and helping at home (p = 0.004). These activities were found to be higher-frequency among students in Cluster 3 and are as follows: outdoor activities (86.6%) significantly more often than in Cluster 4 (p = 0.005), walks (70.7%) significantly more often than in Cluster 4 (p = 0.002), and helping at home (64.6%) significantly more frequently than in the Cluster 2 (p = 0.023) and Cluster 4 (p = 0.002). Sedentary activities during leisure time were associated with watching television programs and sometimes spending time in front of a computer. All students reported such kinds of leisure activities.
It is potentially possible that the obtained results (association between nutrition clusters and physical activity) might be caused by different distributions of these variables in different schools. To exclude the possibility of confounding by school, an additional analysis was performed. The implementation of the generalized estimating equations model, which accounts for clustering of data at the school level, allowed us to demonstrate the relationships between nutrition clusters and physical activity levels independently (Table 5). The model includes the number of days spent on physical activity as the outcome variable and the categorical cluster variable as an independent factor, with the school variable defining the primary sampling unit (Table 5). Results obtained in this analysis were consistent with the previous findings. Children from Cluster 3 were expected to spend significantly more days per week on physical activity (p < 0.001), compared to children from Cluster 1, while children from Cluster 4 were expected to spend significantly fewer days (p = 0.046).

Discussion
The study aimed to evaluate diet and physical activity levels of primary school students. They were assigned to separate clusters, which differed in terms of the analysed dietary behaviours. The results of this study indicated a number of deficits in children who did not have nutritious food and lacked physical activity.
The use of cluster analysis helped to identify four groups of children with different eating habits. These groups did not always differ with respect to all of the variables included in the analysis. Some groups were similar in terms of certain characteristics and at the same time differed with respect to others. Analysis of people's behaviour by similar subgroups is particularly important for public health. At an individual patient level, the doctor involved in a single patient's care can tailor their activities based on individual variables characterizing that patient. In public health, where actions are aimed at groups of people, such an individual approach is not possible. Cluster analysis allows the identification of sub-populations with similar profiles and takes the optimal and appropriate corrective action with respect to each of them.
The existence of a population differing from each cluster can be explained by the existence of latent variables that cannot be measured directly, but where the behaviour of the subjects is determined and translated into a range of observed differences. In the case of this study, the cluster membership variable defined through cluster analysis can be treated as such a latent variable, one which describes a set of beliefs and external factors (e.g., parents' convictions regarding their children) that determine decisions made concerning eating habits. It can be assumed that this variable can also affect other decisions relating to lifestyle, for example, physical activity.
Improper eating habits of primary school children concerned a lack of regularity in eating basic meals throughout the day, leaving school meals, frequently eating between meals, and consuming diets containing significantly more sweets and crisps and fewer raw vegetables. Poor eating habits of school-going children have frequently been observed by other authors (Krenc et al., 2011). Students with poor diets often have difficulty concentrating and acquiring knowledge and show reduced physical activity. The subjects pursuing beneficial nutritional behaviour show a more active lifestyle. This relationship has also been confirmed by other studies (Fleig et al., 2015, Jodkowska et al., 2013bLippke et al., 2012;Lowry et al., 2015).
According to the World Health Organization, children between the ages of 5 and 17 should devote at least 60 minutes every day to moderate or intense physical activity (WHO, 2010). In this study, the levels of physical activity in a population of children were analysed. Only 26.8% of the children included met recommended levels of physical activity, while 62.3% reported average levels and 10.9% fell into the category of those with a low level of physical activity. The highest percentage of students fulfilling the recommendations belonged to Cluster 3 (39.0%) and the lowest to Cluster 4 (13.7%). Similar results regarding the implementation of the recommendations of physical activity were obtained in a study conducted among children in Warsaw by Mańczak et al. (2013). In that study, among children attending primary school, 25% showed a high level of physical activity, 55% an average activity level, and 20% a low activity level. Generally, physical activity levels decline with age. Test results of HBSC (Health Behaviour in School-Aged Children), conducted in Poland in 2010, showed that in a representative sample of pupils, the recommended level of physical activity was observed in only 27.3% of 11-12-year-olds, 17.8% of 13-14-year-olds, 16.2% of 15-The Use of Cluster Analysis for Non-Continuous Variables... 16-year-olds and 10.3% of 17-18-year-olds (Mazur et al., 2011). Only about 10% of adults and 30% of children and adolescents practice forms of activity whose duration, type and intensity meet the needs of physiological stress (Ministerstwo Zdrowia, 2007).
Participation in PE classes is necessary if students are to maintain the recommended level of physical activity. In our study, 86.8% of the children surveyed regularly attended PE classes. Nearly 92.7% of the students in Cluster 3 participated regularly in PE classes, while in Clusters 1, 2, and 4, the percentages of students who participated regularly in PE were 91.1%, 87.5%, and 69.8%, respectively. Students belonging to Cluster 3 did not use exemptions from these lessons. Żwirska et al. (2013) conducted a study among 1,140 primary school children and observed that 98.6% of students regularly participated in PE lessons. A study performed by the Central Statistical Office, conducted among children aged 6-7, showed that 96% of children participated in PE classes (Główny Urząd Statystyczny, 2008). The Supreme Audit Office report of 2013 concerning PE in schools, however, draws attention to the increasing number of students repealing from participating in PE classes. Nearly 3,000 students belonging to the schools inspected were permanently or temporarily excluded from PE classes. In addition, results of the inspection indicated that in the next stages of education, the active participation of children and youth in PE classes decreased (Najwyższa Izba Kontroli, 2013). In 2013, in response to the problem of PE class attendance, the Ministry of Sport and Tourism launched a public campaign "Stop dismissals in WF," which aims to encourage children and youth to systematically participate in PE lessons by, inter alia, diversifying the courses to make them more attractive for students (Rozporządzenie Ministra Edukacji Narodowej z dnia 7 lutego 2012 r. w sprawie ramowych planów nauczania w szkołach publicznych, 2012).
According to the Central Statistical Office, about 20% of 6-7-yearolds did not show any physical activity outside school PE classes (Główny Urząd Statystyczny, 2011). Over 50% of students included in our survey took classes of a sporting nature. Most often, they concerned children in Cluster 3 (62.2%) and least often, children in Cluster 4 (48.5%).
School-aged children who have a sedentary lifestyle watch television, often use a computer, and do homework. It should be noted that the time spent on various sedentary activities accumulates. In addition, the pattern of leisure activity tends to persist from childhood and young age to adulthood (Jodkowska et al., 2013a;Tremblay et al., 2011). Our study revealed that watching TV or using the computer in free time was observed in 51.8% of students. In a study by Stankiewicz et al. (2010), children aged 6-13 pointed out that their most popular leisure time activity was watching TV (24%). Spending time watching television programs may be associated with adverse nutritional behaviour, including an increase in the total pool of energy supplied from the diet (Agostoni et al., 2011;Vader et al., 2009). Our findings indicate that students belonging to Cluster 4 showed behaviours in eating and physical activity that carried the most health risk. Cluster 1 represented students who had a diet deficient in nutrients and who engaged in more physical activity. Cluster 2 represented proper eating habits but less physical activity. Analysis of behaviours related to physical activity in the studied group of children suggests that watching television or using a computer is not always in competition with physical activity, as confirmed by our results. Students from Cluster 3 devoted their free time to both computer classes and physical activity. This means that children can find time for both sedentary lifestyle and physical activities. This finding is comparable to other studies that reported that some children who spend a lot of time on team sports also spend a lot of time on sedentary activities (Jago et al., 2011). Similarly, Jodkowska et al. (2013a) showed that the behaviour of a sedentary lifestyle is not competitive in relation to physical activity for teenagers.

Conclusions
The use of cluster analysis allowed us to identify subgroups of students with different dietary behaviours. This method of analysis also enabled the assessment of the relationship between dietary behaviours and physical activity level to be made, but not in the form of a linear relationship between individual characteristics, as in the case of conventional methods. The cluster analysis demonstrated how different dietary behaviours are associated with various measures taken, often with conflicting behaviours in terms of physical activity. Identifying groups of people in a population also allows a referral concerning appropriate health policy to be addressed to them. Such a referral would be much less effective if carried out at the level of the average population.
An advantageous relationship between students' dietary behaviours and physical activity has been shown. Patterns of physical activity and preferred ways of spending free time in childhood have become more visible and can become a habit. Increases in children's physical activity is influenced by family environment, including knowledge, motivation, and activities of parents and school. Encouraging physical activities at home, at school, and