The impact of attention to social information on the development of socialization in childhood
Kategoria artykułu: Research Article
Data publikacji: 07 cze 2025
Zakres stron: 44 - 53
DOI: https://doi.org/10.2478/sjcapp-2025-0005
Słowa kluczowe
© 2025 Toru Fujioka et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Development is an extremely complex process that occurs through the cumulative consequence of the interactions and transactions of various systems, and this is referred to as a developmental cascade (1). The development of socialization also proceeds through this complex process. It is said that early emerging behavioral symptoms alter children’s self-directed patterns of attention, changing their experience of the environment and further restricting social learning opportunities (2).
To clarify one aspect of the complex development of socialization, previous studies investigated whether attention to social information measured using an eye-tracking system influences its later development. Young et al. (2009) conducted a longitudinal study targeting 108 infants aged 6 months, whereof 55 had siblings with autism spectrum disorder (hereinafter referred to as “autism”), which is characterized by unusual eye contact and other social communication difficulties; 43 had typically developing older siblings; and 10 had an older sibling with some developmental delay. Consequently, attention to the mouth when their mother’s still-face video at 6 months was presented had a positive impact on social skills at 24 months, although this was no longer significant when controlling for language skills (3). In a study of children with autism, the degree of looking at the inside of the face when viewing activity movies at age 2 was not related to the socialization score at age 3 (4). Another study of children with autism used stimuli that simultaneously presented movies of people and geometric shapes (a preference paradigm) and showed that fixation time on geometry at the age of 2 years was not associated with social skills at the age of 8 years (5). Therefore, previous studies suggest that attention to social information within this experimental testing paradigm of eye-tracking does not strongly influence the development of socialization.
However, there are some aspects of the aforementioned previous studies that need to be improved. The first point is the attributes of the subject. Previous studies have focused more on children with autism who spend less time looking at people's eyes and people than on children with typically developing (6,7,8,9) or their siblings who exhibit relatively high autistic-like traits (10). For clarifying development of socialization, it is necessary to examine how attention to social information affects the development of socialization in children with typically developing rather than in children with autism. Second, the age of the participants and the duration of the longitudinal study would be important factors. In early childhood, a variety of abilities develop in sequence and various skills are acquired. Young et al. reported that attention to social information presented through still faces at 6 months partially predicted social skills at 24 months of age (3). As socialization develops, children around three years of age begin to shift from parallel play, in which they play independently, to associative play, which is a group activity (11), and are able to perform theory of mind tasks (12). The development of social-cognitive abilities during early childhood is extremely complex, and there is a need for the accumulation of knowledge in this area, especially longitudinal studies to clarify causal relationships (13). Eye-tracking research can be conducted by having participants watch without verbal instruction (14), and it is also minimally invasive (15, 16). From this perspective, eye-tracking systems are one of the most suitable tools for research on infants and young children. In fact, numerous studies have focused on infants (16,17,18,19). Using eye-tracking systems, clarifying the causal relationships between attention to social information and dynamic changes in socialization in early childhood in a safe environment is important for enhancing our understanding of the early childhood development of socialization. Thirdly, and this is an issue that concerns eye-tracking research in general, each study used different stimuli each researcher created. The stimuli used in the studies mentioned above were face stimuli in young et al. (3) and Campbell et al. (4), and preference paradigm in Bacon et al. (5). Furthermore, in face stimuli, for example, the eye/mouth gaze rate depending on whether the mouth is moving or not (6). The problem that different studies use different stimuli would have affected the results of previous studies and could have had a significant impact on the accumulation of research results. In order to conclude about the influence of attention to social information on the development of socialization, further research need to be conducted under a variety of conditions, using similar stimuli if possible.
Therefore, the present study aimed to determine how attention to social information affects the development of socialization among typically developing children after three years of age at three time points separated by one to two years. We targeted this age range because of two main reasons. First, we considered that it was important for clarifying the relationship between attention to social information and the development of socialization after the age of 3 years, as longitudinal studies on the subject matter have been conducted on children aged between 6 months and 2 years old (3), and children aged 2 and 3 years old (4). Second, although there was a report that fixation time on geometry at the age of 2 years was not associated with social skills at the age of 8 years (5), as various factors may accumulate in complex ways vis-à-vis the development of socialization, we decided that the relationship between attention to social information and the development of socialization should be clarified within a narrower age range. Furthermore, to approach the problem whereby various studies have used different stimuli, we used the commercially available Gazefinder (JVC KENWOOD Corporation, Kanagawa, Japan), an all-in-one eye-tracking system in which hardware and stimulating videos (face stimuli, biological motion, preference, and finger pointing) are grouped together. Additionally, we measured participants’ autistic-like traits using the Social Responsiveness Scale–Second Edition (SRS-2) (20) and intelligence quotient using the Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV) (21) to control for these effects while exploring the typical relationship between attention to social information and the development of socialization.
We recruited 24 typically developing preschool children (14 boys and 10 girls) from the local community, including those admitted in local nursery schools and universities in prefectures with a population of approximately 700,000. We sent leaflets and emails to the local community members, and asked those who were interested to apply. Therefore, participants in this study correspond to convenience samples, and it is impossible to deny that our participants may not reflect a sample that represents the population. All the participants were of Japanese ethnicity. On the face sheet of the questionnaire, we confirmed that for the typically developing participants, there was no indication of disorder at the medical checkups for 1.5-year-old and 3-year-old children conducted by pediatricians and public health nurses in Japan, and that they had not been diagnosed with mental disorders and had no active diseases requiring continuous hospital visits. The results of the Strengths and Difficulties Questionnaire (22) for this study’s participants are presented as supplementary information. Caregivers or legal guardians filled the Strengths and Difficulties Questionnaire in connection to their child's behavioral characteristics. The questionnaire comprises the subscales of emotional symptoms, conduct problems, hyperactivity/inattention, peer problems, and prosocial behavior, and it was standardized in Japan (23, 24).
We utilized Gazefinder, an all-in-one eye-tracking system in which hardware and stimulating videos are grouped together to evaluate the percentage of fixation time allocated to specific objects on a video monitor. Participants’ eye positions were measured using infrared light sources and cameras located below a 19-inch thin-film transistor (1280 × 1024 pixels). Eye position was recorded using corneal reflection techniques as (X, Y) coordinates at a frequency of 50 Hz (3000 data collections/min). Calibration of the eye position recordings was performed using a five-point method. It is recommended that the distance between the face and monitor be maintained at approximately 70 cm.
After calibrating the eye position, Gazefinder presented five types of videos: (A) human faces without mouth motion, (B) human faces with mouth motion, (C) biological motion of a human, (D) preference paradigm, and (E) finger pointing. (A) Human faces without mouth motion included movies of a still face (4 s), eye blinking (an actor repeatedly opens and closes her eyes for 5 s), and a still face (an actor with a still face appears for 5 s; this movie was presented after the mouth moving face movie, which is described below). (B) Human faces with mouth motion included movies of a mouth-moving face (an actor repeatedly opens and closes her mouth for 5 s) and a talking face (7 s). In the talking face movie, the actress says, “Konnichiwa” (“Hello”), “Onamaewa?” (“What is your name?”), and “Issyoniasobouyo” (“Let’s play together”). Facial stimuli are considered to be representative of social stimuli (25), and humans appear to naturally pay attention to the face, especially the eye area (26). The difference between “(A) human faces without mouth motion” and “(B) human faces with mouth motion” is that the moving mouth interferes with directing attention to the eye region. Previous studies have reported that attention to the mouth region increases when individuals are shown movies of talking faces or faces with the mouth in motion (26, 27). In particular, research conducted among adults using the Gazefinder (27) reported that the percentage of fixation times on the eyes in a still image, blinking, and a still face presented after blinking was significantly lower in the autism group than in the typically developing group, and the percentage of fixation times on the mouth was higher in the autism group than in the typically developing group for blinking and a still face presented after blinking. On the other hand, there were no significant differences between the autism and typically developing groups in the percentage of fixation times on the eyes and mouth in a mouth-moving face and a talking face. Therefore, it was reasonable to use (A) human faces without mouth motion and (B) human faces with mouth motion as a synthetic index. The correlation between the AoIs of each face stimulus before synthesis are presented as supplementary information (Table S2–4). (C) Biological motion (BM) movies are simultaneously presented with upright and inverted biological motions for 11 s. The movie was accompanied by the song “Under the Big Chestnut Tree,” to which an upright human danced. BM movies measure the degree of attention to social information based on the hypothesis that humans show an innate preference for BM to facilitate adaptive interactions with other living beings (28). In addition, a brain imaging study in adults suggested that activated brain areas processing inverted biological motion differed from those processing upright biological motion (29). (D) The preference paradigm movies simultaneously showed people and geometric shapes of the same size (20 s; four videos of 5 s each) and geometric shapes in small-frame images in a small window embedded in the movies of people (16 s; two videos of 8 s each). The preference paradigm movie measures the degree of attention paid to social information under the hypothesis that individuals with autism prefer to pay attention to highly repetitive rather than social images (30). (E) Finger-pointing videos presenting objects with or without finger pointing (8 s; two videos of 4 s each). Finger-pointing, also known as joint attention, concerns a shared attention state between two individuals focused on an object or event of interest (31) and is categorized as representative of social stimuli (25). Therefore, the degree of attention paid to social information can be determined by measuring responses to these types of stimuli. Each stimulus was presented once, and the presentation order was randomly predetermined; all participants viewed the stimuli in the same order. A music box sound was played while the stimuli were presented, except for (C), the BM. Figure 1 shows a sample of the stimuli.

Gazefinder movie samples and their areas-of-interest (AoIs). (A) Screenshot of the human face without mouth motion; AoI-1 and AoI-2 include the eye and mouth regions, respectively; (B) Screenshot of the human face with mouth motion; AoI-1 and AoI-2 include the eye and mouth regions, respectively; (C) Screenshot of biological motion; AoI-1 and AoI-2 are the upright and inverted images, respectively; (D) Screenshot of the preference paradigm; AoI-1 and AoI-2 are people and geometry, respectively; (E) Screenshot of finger pointing; AoI-1 and AoI-2 are social and geometry areas, respectively.
The percentage of fixation times allocated to areas of interest (AoIs) on the video monitor were automatically calculated (time allocated to a particular area/duration of stimulus presentation). The stimulus movies were loaded on Gazefinder, and the AoIs of each stimulus were set by default; therefore, there was no need for the experimenter to change the setting to derive the percentage of fixation times allocated to the AoIs. Figure 1 shows the AoIs for each stimulus. In the present study, the indexes were calculated using the presentation time of each stimulus for indexes that were created by combining several stimuli, including (A) human faces without mouth motion, (B) human faces with mouth motion, and (D) the preference paradigm. For example, in the percentage of fixation times to the eye region of (A) human faces without mouth motion, we added the value of the percentage of fixation times to the eye region of a still face multiplied by 4, the value of the percentage of fixation times to the eye region of eye blinking multiplied by 5, and the value of the percentage of fixation times to the eye region of a still face after eye blinking multiplied by 5, and then divided the sum by 14.
Intellectual level was assessed using the WISC-IV (21). The administration of the WISC-IV provides the full-scale Intelligence Quotient (FSIQ) and four index scores, including the Verbal Comprehension Index, Perceptual Reasoning Index, Working Memory Index, and Processing Speed Index, that are obtained with a mean of 100 and a standard deviation (SD) of 15.
To measure socialization, we used the second edition of the Vineland Adaptive Behavior Scales (VABS-II) (32). This was an assessment of adaptive behavior conducted through a semi-structured interview with respondents who were familiar with the participants. The VABS-II consists of four domains: communication, daily living, socialization, and motor skills, with a mean of 100 and a standard deviation (SD) of 15. The Japanese version is standardized for ages 0 years and 0 months to 92 years and 11 months (33).
Autism has been reported to attract unique attention to social information. Therefore, the SRS-2 (20), a questionnaire completed by parents, was used to measure the degree of participants’ autistic-like traits to ensure that such traits did not influence the results. The SRS-2 consists of 65 items, and the total SRS score ranges from 0 to 195, with higher total scores indicating more severe social deficits. There are SRS-2 forms for preschool children (2.5–4.5 years), school-age children (4–18 years), and adults (ages ≥19). We used the SRS-2 preschool and school-age form for participants aged 2–3 years and older than 4 years, respectively. With reference to the Japanese cutoff score for the SRS-2, the cutoff score for the SRS-2 preschool form was 48.5 (sensitivity, 0.83; specificity, 0.82) (34), and the cutoff scores for the SRS-2 school-age form were 53.5 for boys (sensitivity, 0.91; specificity, 0.48) and 52.5 for girls (sensitivity, 0.89; specificity, 0.41) (35).
Longitudinal studies, published between 2016 and 2023, were conducted at three time points separated by one to two years: Waves 1, 2, and 3 in order from the first time. The Gazefinder was used, and VABS-II and SRS-2 were performed from waves 1 to 3. The WISC-IV was administered only in Wave 3. The study protocol was approved by the Ethics Committee of the University of Fukui and conformed to the tenets of the Declaration of Helsinki (revised in 2000). After a complete explanation of the study, all participants or their parents or legal guardians provided written informed consent.
Individuals were excluded when their available percentage of fixation time was < 70% (i.e., Gazefinder could not detect the eye position for more than 30% of the stimulus presentation time). We used a 3-wave cross-lagged effects model (Figure 2) in the structural equation modeling framework to provide information about the strength of the temporal relationship among the variables, which is necessary to establish causality (36), and to explore the influence of attention to social information on socialization. We analyzed the eye regions for human faces with and without mouth motion, upright images for biological motion, people for the preference paradigm, and social areas for finger-pointing as social information. The correlation between the percentages of these indexes and the socialization score of the VABS-II are included in the Supplementary Information (Tables S5–9).

Cross-lagged effects model of the scores of Gazefinder and VABS-II.
The evaluation of model fit included four categories: goodness of fit index (GFI), adjusted GFI (AGFI), comparative fit index (CFI), and root mean square error of approximation (RMSEA). For GFI and AGFI, values of 0.90 or higher were considered a sufficiently good fit. For CFI, a value of 0.95 or higher was considered a good fit (37). For RMSEA, a value of 0.05 or lower was considered a good fit, while a value between 0.05 and 0.08 was considered an acceptable fit (38). Analyses were performed using SPSS 27 (IBM Corp., Armonk, NY, USA), and structural equation modeling analyses were performed using AMOS 27 (IBM Corp.). Note that there are various criteria for the sample size required for structural equation modeling: some state more than 100, some state 5–10 per variable, and some state 30–460 (39). However, the sample size for this study did not meet these criteria. Therefore, to interpret only results that are relatively firm, only those that met all four criteria of GFI, AGFI, CFI, and MSEA were considered.
According to the Gazefinder, no participant had a fixation time less than 70% from Waves 1 to 3. Therefore, all the participants were included in the analyses. Table 1 presents the descriptive statistics for each index at each time point. There were no significant differences in the values at the three time points for all indices, except age (Wave 1<2<3). The average score on the SRS-2 was well below the cutoff point for the Japanese version (53.5 for boys and 52.5 for girls) (35) at all time points, and the FSIQ at Wave 3 for all participants was −1.0 SD or higher. No significant differences were observed between the sexes in any of the indices shown in Table 1 (t < 1.952, ps>.064).
Descriptive statistics for each index at the three time points.
Wave 1 | Wave 2 | Wave 3 | F | p | ηp2 | |
---|---|---|---|---|---|---|
Age (range) | 4.01 ± 0.84 (2.10–5.57) | 5.22 ± 0.66 (4.19–6.61) | 6.57 ± 0.63 (5.25–7.61) | 350.515 | < .001 | 0.938 |
SRS (range) | 29.52 ± 9.46 (9–46) | 25.96 ± 10.74 (7–48) | 24.30 ± 12.83 (4–69) | 2.841 | .069 | 0.114 |
WISC-IV FSIQ (range) | 104.82±12.63 (85–130) | |||||
Gazefinder | ||||||
Face without mouth motion | ||||||
Eye region (%) | 46.3 ± 19.5 | 56.3 ± 12.4 | 53.0 ± 18.7 | 0.688 | .508 | 0.029 |
Mouth region (%) | 18.1 ± 12.8 | 20.8 ± 10.9 | 20.7 ± 13.5 | 2.971 | .061 | 0.114 |
Face with mouth motion | ||||||
Eye region (%) | 25.3 ± 15.5 | 22.9 ± 16.5 | 26.4 ± 17.6 | 0.521 | .597 | 0.022 |
Mouth region (%) | 44.9 ± 27.1 | 54.6 ± 18.8 | 52.5 ± 20.7 | 2.407 | .118 | 0.095 |
Biological Motion (BM) | ||||||
Upright (%) | 48.5 ± 18.6 | 53.2 ± 16.5 | 47.2 ± 20.1 | 0.735 | .485 | 0.031 |
Preference Paradigm | ||||||
People region (%) | 45.1 ± 18.9 | 44.5 ± 11.6 | 46.9 ± 12.3 | 0.267 | .708 | 0.011 |
Finger Pointing | ||||||
Social (%) | 39.7 ± 23.3 | 36.1 ± 11.7 | 41.9 ± 12.0 | 0.951 | .378 | 0.040 |
VABS-II | ||||||
Communication | 100.5 ± 10.5 | 99.0 ± 9.9 | 103.3 ± 8.8 | 1.949 | .154 | 0.078 |
Socialization | 103.6 ± 6.6 | 100.7 ± 8.4 | 98.1 ± 10.3 | 3.147 | .052 | 0.120 |
Daily living skills | 103.9 ± 11.3 | 103.5 ± 11.5 | 100.1 ± 12.7 | 1.224 | .304 | 0.051 |
The model fit test results showed that (D) the preference paradigm met all four model fit criteria. The percentage fixation times to people region of the preference paradigm at Wave 1 affected the socialization score of VABS-2 at Wave 2 (β= .581, p < .05), although the socialization score of VABS-2 at Wave 1 did not affect the percentage fixation times to people region of the preference paradigm at Wave 2. In the association between Waves 2 and 3, neither the path of the score of sociability from the percentage fixation times to the people region nor vice versa was significantly affected. The results are shown in Figure 3.

Cross-lagged panel model of percentage fixation times for social information and socialization scores of VABSII. Note. *p < .05, **p < .01. The values on the path were for the standardized coefficient (β). The values from the top are for the human face without mouth motion, the human face with mouth motion, biological motion, preference paradigm, and finger-pointing. The results for (D), the preference paradigm that met all four model-fit criteria, are underlined in bold.
Although the limitation of “the sample size is small” should be considered, the percentage of fixation time in the people region of the preference paradigm in Wave 1 predicted sociability in Wave 2. We speculated that this index measured interest in people, and thus predicted socialization for a certain age group. At younger ages, specifically 4.01 ± 0.84 years old, ranging from 2.10 to 5.57, the preference paradigm, i.e., interest in people, predicted social development, which is a major new meaningful finding regarding social development.
A potential reason why the percentage of fixation times in the people region of the preference paradigm at Wave 1 (mean age: 4.01 years) predicted socialization in Wave 2 (mean age: 5.22 years), but the one at Wave 2 did not predict socialization in Wave 3 (mean age: 6.57 years) could be that the factors influencing both the percentage of fixation times in the people region of the preference paradigm and socialization became more complex from Wave 2 to Wave 3 than from Wave 1 to Wave 2. Children around three years of age begin to shift from parallel play, in which they play independently, to associative play, which is a group activity (11), and are able to perform theory of mind tasks (12). In subsequent developmental stages, executive functions, such as inhibition, working memory, and set shifts, have been reported to show rapid development by the age of five (40, 41). It has also been reported that children’s ability to infer the mental states of others, as represented by the theory of mind, is enhanced by the age of five, and learning strategies are also said to change accordingly (42). As a result of the development of the abovementioned abilities during the dramatic developmental changes that occur after the age of 3 years, the impact of the percentage of fixation time on people’s regions of the preference paradigm on socialization from Wave 2 to Wave 3 may have been attenuated.
The reasons why fixation on other social information does not meet the criteria of model fit (GFI, AGFI, CFI, and MSEA) include: First, regarding the eye region of a human face with/without mouth motion, the eye is said to be a specific part of social information. Attention to the eye region has been suggested to be related to a greater variety of factors, such as anxiety (43, 44), than attention to other social information. Research using Gazefinder reported that there were no differences in the attention score to the eye area of the face stimuli between the autism and typically developing groups, while the autism group showed less attention to the people region in the preference paradigm, regardless of age (6). Therefore, it is assumed that the factors associated with attention to the eye region of the human face differ from those associated with attention in the preference paradigm. In other words, the percentage of fixation times on the eye region may not have significantly affected the subsequent development of socialization because the factors associated with it are more complex than the percentage of fixation times on other forms of social information. For BM, in the age of the participants targeted in this study, some stated that children with autism gazed less at BM (28, 45, 46); others showed no difference between children with autism and those that were typically developing (6); while others contested that the autism group gazed more (47). Therefore, it is assumed that attention to BM does not simply reflect socialization and that various factors may be related to attention to BM and prevent the percentage of fixation times on upright images of BM from fitting the model proposed in this study. For finger pointing, as in the preference paradigm, the AoIs consisted of social areas, including people and geometry areas, but the percentage of fixation times on social areas was not fitted as a model. Toddlers with autism attentively watched their hands moving in a finger-pointing video (48). In the finger pointing movies of Gazefinder, only people moved. It is possible that the percentage of fixation times on the social area of finger pointing did not fit the model criteria because attention to moving objects was so strong that it weakened the association between the percentage of fixation times on the social area and socialization. As mentioned above, the Gazefinders’ attention to social information other than the preference paradigm may have been influenced by a relatively large number of factors and/or by the lesser degree of effects of the factors that strongly influenced the preference paradigm, and did not fit the model criteria proposed in this study.
Our study has several limitations. First, as we have already emphasized repeatedly, the sample size is small. Therefore, the results of this study should be interpreted as crude pilot data. However, it may be that even under such conditions, the percentage of fixation times in the people region of the preference paradigm in Wave 1 predicting socialization in Wave 2 has a strong influence. However, we recognize the present study as a preliminary study; therefore, we need to clarify whether the results of the current study can be utilized for power analysis in future studies and replicated with a larger sample size with a narrow age range in each wave. Second, the percentage of fixation times in the people region of the preference paradigm in Wave 2 did not predict socialization in Wave 3. Although we speculate that the influence of attention to people’s regions on social development may have diminished with the development of various abilities that affect socialization, we also need to confirm that this assumption is correct. Third, it is necessary to examine our hypothesis using indicators other than the VABS-II for socialization. The socialization of VABS-II consists of interpersonal relationships, play and leisure time, and coping skills, thus it includes a large conceptual cohesion. Therefore, it is likely that the development of the socialization score of VABSII was influenced by a variety of factors, which may have also influenced the results of this study. Further research is needed to determine whether attention to social information predicts scores that cover a narrower range of concepts than in the current study. Fourth, it is importance to identify factors that predict the development of socialization using a variety of factors other than social information. A variety of factors have been reported as predictors of socialization development. For example, socialization has been reported as influences by the closeness of the teacher-child relationship (49), parental warmth and children’s emotional regulation (50), and socioeconomic status (51). As mentioned earlier, development is recognized as a cumulative consequence of the complex interactions and transactions of various systems, called a developmental cascade (1). Beginning with our current research, exploring factors related to the development of socialization with indicators other than attention to social information will help unravel their development.
In the present study, we aimed to determine how attention to social information affects the development of socialization in children with typically developing and suggests that the degree of interest in people has a strong positive influence on the development of sociability during early childhood. This was a meaningful new finding in unraveling one aspect of the development of socialization. In addition, our results could have significant impact on developmental support for individuals with autism that have deficits in social communication and social interaction. Through encouraging interest in people and teaching where to pay attention to social information, it may be possible for them to acquire better adaptive behaviors.