Influencing factors and group differences in medical data sharing in clinical research scenarios

Medical data, derived from the diagnosis and treatment activities of medical institutions, reflects patients’ physiology and health status (Gao, 2020). As a core component of medical decisionmaking and research (Zhuang, 2023), it holds significant potential value. With the rise of precision medicine and personalized therapies, clinical researchers increasingly demand high-quality, large-scale medical data. As the cornerstone of clinical research, the sharing and utilization of medical data have become crucial for advancing medical progress (Scheibner et al., 2021). Specifically, medical data sharing can facilitate cross-institutional and cross-regional collaborative research, accelerate the development of new drugs and the optimization of disease diagnosis and treatment (Johnson et al., 2021; Kaul et al., 2020), reduce research duplication, conserve scientific resources, and enhance research reproducibility and transparency (Flanagin et al., 2022; Rowhani-Farid & Barnett, 2016). Thus, in the era of big data and real-world research, medical data sharing in clinical research scenarios serves as a novel approach to explore disease mechanisms, expand medical knowledge, and drive medical innovation (Wei et al., 2021). In the medical data sharing system, as shown in Figure 1, six data streams are formed. Patients, as medical data subjects, are the actual owners of the data, such as the original medical examination reports held by patients. The law grants them ownership, informed consent, access, and deletion rights. Medical institutions can apply medical data for administrative collaboration, cross-institutional public health emergency response and clinical scientific research when patients sign informed consent or are exempted by law. Medical data sharing in clinical research scenarios involves four major stakeholders: medical data controller, user, processor, and supervisor. Its core objective is to achieve data circulation and efficient utilization between controllers and users. Under the supervision and management of the government, with patient informed consent and ethical supervision as the foundation of compliance, medical institutions can entrust medical data centers to desensitize the original data and transfer it to clinical researchers for use. Some de-identified data can be directly authorized by medical institutions for controlled access by clinical researchers and used for scientific research. Ma et al. (2024) pointed out in his interpretation of the data value chain that the commercialization of data is the core link in realizing the value of data, and users, as key participants in this link, their acquisition and usage behaviors directly affect the process of data commercialization.

However, medical data sharing is not a smooth road without obstacles. It involves complex issues from multiple dimensions such as ethics, law, technology and management (Scheibner et al., 2021). To assist clinical researchers in better obtaining and utilizing medical data, enhancing the quality and speed of clinical research, and understanding the differences in medical data sharing among user groups of different genders, years of research experience, institutions, professional identities, and data sharing experiences, insights into medical data sharing behaviors in clinical research scenarios are indispensable. In each subdiscipline of precision medicine research, researchers are generally faced with large-scale, complex and high-dimensional data processing challenges, and these valuable data resources are often confined within their own research institutions, forming data islands, which seriously hinder the repeatability of research (Sriram et al., 2025). Although the sharing intention for medical data has increased over time, the actual activities of medical data sharing have not significantly increased (Locher et al., 2023). Data users often find it difficult to make effective use of shared data due to lack of access rights, inconsistent data formats and inconsistent data quality. At present, research on the influencing factors influencing medical data sharing has primarily focused on the sharing intentions of patients or researchers(Fujita et al., 2024). These studies mainly investigate the barriers to data sharing among data controllers(Vickers, 2016). Few studies have delved into the clinical research context from the perspective of data users to explore the motivations for data sharing. Furthermore, most empirical studies have overlooked the groupization differences among participants in medical data sharing, and the conclusions drawn may not fully capture the behavioral characteristics of medical data user groups with different personal and professional backgrounds in clinical research scenarios. Especially in the aspect of integrated analysis of multi-dimensional influencing factors, most studies only focus on a single or a few factors, without systematically integrating multi-dimensional factors such as individual characteristics, psychological states, organizational environments, and data features, making it difficult to comprehensively reveal the complex mechanisms behind the behavior of medical data users.

To promote research on medical data sharing, accelerate the transformation process of clinical research achievements, and enhance the well-being of the vast number of patients. This study focuses on medical data users, deeply analyzes their group differences, integrates multi-dimensional influencing factors, and proposes the concept model of medical data user sharing behavior (CRS-USB) in clinical research scenarios. Given the scalability of the UTAUT model and its application in the field of data sharing (Lv & Wang, 2023), integrating the theories of trust and self-efficacy can promote the transformation of sharing strategies towards a composite mechanism. That is, through the trust mechanism, concerns about the abuse of sensitive data can be alleviated, and self-efficacy empowerment can break through technical operation obstacles, thus resolving the predicament of “high sensitivity - strong professionalism” in medical data. This will further enhance the trusted management and control capabilities of medical data, the efficiency of resource interaction and the potential for value co-creation, laying a theoretical foundation for a secure and controllable medical trusted data space. The study addresses the following questions: (1) What motivates medical data users to obtain and use data in clinical research? (2) How do the factors influencing the sharing behavior of medical data users interact with each other? (3) In the sharing of medical data, are there significant differences among user groups with different personal and professional backgrounds?

2

Literature review

2.1

Influencing factors of medical data sharing

Scholars mainly explore the factors affecting medical data sharing from two aspects of driving forces and potential barriers, and focus on in-depth analysis from the perspective of researchers and patients. Researchers are reluctant to share data due to improper use of data due to limited institutional resources, fear of being preempted for publication of data before publication, or loss of reputation if other researchers find errors in the data (Geneviève et al., 2019). Patients and the public generally support data sharing for health research, but consider issues of value, privacy, risk, data security, trust, transparency, accountability and avoid concerns about breaches of confidentiality and potential misuse of data (Kalkman et al., 2022). Hutchings et al. (2020) found that data ownership, privacy, intellectual property rights and the possibility of misuse of data hinder sharing. In particular, medical data is highly knowledge-dependent, which easily arouses people’s concern about the sharing risk. These drivers and obstacles can be classified as technological, motivational, economic, political, sociocultural, ethical, and legal dimensions(Auffray et al., 2016; Edwards et al., 2010; Mostert et al., 2016; Van Panhuis et al., 2014; Vest & Gamm, 2010).

2.2

The application of UTAUT model

Unified theory of acceptance and use of technology (UTAUT) is a theoretical model proposed by Venkatesh et al. in 2003 to explain and predict individuals’ acceptance and use of information technology (Tan et al., 2009; Venkatesh et al., 2016). Researchers improve the explanatory power of the model by introducing external variables, integrating other theoretical variables and adding situational variables. For example, Niu et al. (2022) introduced government trust and technology trust into the UTAUT model and constructed and tested the conceptual model of public use behavior of contact tracking technology. In the medical field, COVID-19 fear and social isolation were introduced into UTAUT models as exogenous components, and their significant effects on behavioral intention were verified (Suliman et al., 2024). In the study of smes’ intention to share data, Guo et al. (2023) integrated social exchange theory and UTAUT to build an influencing factor model, and explored the impact of organizational rewards, reciprocal welfare, expectation fulfillment and job relevance motivations on smes’ emotional trust and technology trust in the context of social media in mature enterprises in the process of knowledge sharing. Lv and Wang (2003) started from the two anent variables of perceived risk and resistance to change, integrated UTAUT and planned behavior theories, and explored the impact of perceived ease of use, perceived usefulness, and social norms on the willingness of small and medium-sized enterprises to share data.

2.3

Research review

The research perspective on medical data sharing factors is relatively narrow, with a preponderance of studies focusing on barriers to medical data sharing. Most of these studies concentrate on issues related to technology, policy, and management. Few studies have explored the needs, expectations, and concerns of clinical researchers regarding medical data sharing from their perspective. Scholars mostly use interviews and fieldwork to study medical data sharing. There is a lack of theoretical models based on specific factors for clinical researchers, as well as empirical research to explore the data sharing mechanisms of medical data users in clinical research scenarios. With its explanatory power of up to 70% of user behaviors (Venkatesh et al., 2003), the UTAUT model significantly exceeds other single theoretical models, and can accurately predict and explain the acceptance of shared behaviors by medical data users. The exceptional scalability of the UTAUT model allows researchers to adapt it to specific clinical research contexts. This study flexibly incorporates key variables such as trust and data transparency, further expanding and refining the conceptual model. It comprehensively captures the complex motivations behind medical data sharing behavior.

3

Conceptual model development

3.1

Hypotheses development

3.1.1

Internal motivations and sharing intention

Performance expectation, as an internal driver, refers to the belief that users hold when using a product, service or technology, that is, they believe that the use of these resources can significantly improve the quality and efficiency of their life, work or other aspects (Niu et al., 2022; Xu, 2011). The UTAUT model holds that performance expectation is the main determinant of user intention (Venkatesh et al., 2003). In terms of the sharing of medical data, the belief of users that they can achieve data empowerment and talent cultivation in the process of obtaining and using medical data is a key factor influencing their willingness to participate in sharing activities. Effort expectation, as a key factor in the UTAUT model, is equivalent to perceived ease of use in the technology acceptance model (Wu et al., 2019), reflecting individuals’ perception of task difficulty, as well as their assessment of their own abilities and resources. Huang et al. (2013) applied the technology acceptance model to demonstrate the positive impact of perceived ease of use on users’ willingness to share knowledge in virtual communities. In clinical research, medical data sharing involves multiple links such as data collection, organization, transmission and use, and these links may involve various challenges such as technical operation, privacy protection and data security. If users think these processes are rather complicated or require a lot of effort, their willingness to share may be reduced. On the contrary, when users believe that the process of medical data sharing is simple and they can obtain the needed data proficiently and conveniently, they are more willing to participate in medical data sharing. The starting point of trust theory is interaction, which emphasizes the important role of trust in interpersonal relationship (Dong & Duan, 2024). As an internal motivation, trust is the basic condition for people to carry out social exchange, which affects people’s intention and behavior to participate in a certain activity. According to Rotter’s understanding of trust, it can be divided into interpersonal trust, people’s trust in the community or the platform (Rotter, 1967). In addition, in clinical research scenarios, there is also people’s trust in data, that is, medical data users believe that the obtained medical data is complete and true, and also believe that the medical data itself has academic research value. Medical data sharing can be regarded as a kind of social exchange behavior. The higher the degree of trust that medical data users have in the data controller, the sharing platform and the medical data itself, the stronger the willingness of medical data users to participate in medical data sharing. Therefore, this study proposed the following hypothesis: H1.

Performance expectancy positively influences users’ sharing intention.

H2.

Effort expectancy positively influences users’ sharing intention.

H3.

Trust positively influences users’ sharing intention.

3.1.2

External motivations and sharing intention

Social influence comes from the theory of rational behavior, which reflects the extent to which individuals are influenced by the concepts, behaviors and social norms of the surrounding groups in their social environment. In the research of influencing factors of Weibo users’ forwarding behavior, community influence has a positive impact on users’ forwarding intention (Xu et al., 2022). Community influence can comprehensively enhance the willingness to share medical data through interpersonal recommendations and organizational support, etc. When friends use the shared data and recommend it to them, a demonstration effect will be formed. Meanwhile, the support and encouragement from academic research institutions for data sharing can further enhance individuals’ acceptance of data sharing. In data governance research, data transparency is defined as the degree to which a platform is open about the source, quality, and purpose of the data it collects and processes (Wang & Huang, 2023). The International Article Coding Organization (GS1) applies the “Product Digital Passport” to increase data transparency, enabling traceability and visibility data sharing (Xie & Liang, 2023) and increasing intention to share data. In the research of medical data sharing, data transparency means that doctors and other authorized personnel can access the data, and the methods of obtaining and processing medical data are open and transparent, etc. A transparent data sharing mechanism not only ensures the accuracy and security of data, but also enables researchers to clearly understand the legal uses of the data, thereby participating in data sharing more actively. Therefore, this study proposed the following hypothesis: H4.

Community influence positively influences users’ sharing intention.

H5.

Data transparency positively influences users’ sharing intention.

3.1.3

Internal motivation and sharing behavior

Sharing behavior refers to the actual sharing actions taken by medical data users. Research on knowledge sharing has proved that trust has a significant positive effect on sharing behavior (Li et al., 2021). Hypothesis 3 indicates the possibility that trust has a significant positive impact on users’ willingness to share, and the positive influence of trust on sharing behavior is reflected in the sharing activities carried out by users based on trust. Intention refers to people’s expectations of their own behavior, which is the subjective tendency of individuals to want to engage in a certain behavior (Li, 2013). Some scholars have studied the relationship between data sharing intention and sharing behavior (Bi et al., 2020; Shi et al., 2023). In clinical research scenarios, the willingness of medical data users to share refers to their subjective tendencies and expected behaviors towards obtaining and using medical data. The stronger the willingness of medical data users to obtain data is, the greater the possibility that they will actually obtain and use the data. Individual innovation reflects the initiative, openness and adaptability of individuals in the process of innovation. People with a high degree of innovation are more likely to have a positive view of new technologies(Abu-Al-Aish & Love, 2013; Agarwal & Karahanna, 2000). As an internal motivation, individual innovation promotes individual behavior mainly through individual’s own internal characteristics and psychological tendencies. Previous studies have explored the influence of innovation on individual behavioral decision making(He & Luo, 2016), Ju and Lee(2021) focused on the specific dimensions of the internal structure of innovation and found that cognitive innovation can significantly improve the actual adoption probability of new products. Individuals with strong innovation tend to explore new research paths, are willing to be the first to try to obtain key information through medical data sharing, and carry out relevant practices earlier in the research team. Self-efficacy refers to an individual’s subjective judgment and confidence on whether he can successfully complete a certain task and achieve the expected goal in a specific situation. It mainly reflects a person’s perception and belief in his own ability (Bandura, 1978), including two core parts: outcome expectation and efficacy expectation. Research shows that higher self-efficacy is a key factor in predicting individuals’ more active participation in online knowledge sharing behavior (Chen & Hung, 2010; Sun et al., 2023; Xu, 2011). Medical data users with high self-efficacy are more convinced that they can obtain the data they need, believe that they can complete data sharing tasks through efforts, and can find multiple solutions when facing difficulties. This sense of confidence and ability makes them feel more confident when participating in the sharing of medical data, thus enabling them to engage in the sharing behavior more actively and promoting the smooth progress of data sharing. Therefore, this study proposed the following hypothesis: H6.

Trust positively influences users’ sharing behavior.

H7.

Sharing intention positively influences users’ sharing behavior.

H8.

Individual innovation positively influences users’ sharing behavior

H9.

Self-efficacy positively influences users’ sharing behavior.

3.2

The concept model of CRS-USB

Existing literature has provided an important theoretical basis and empirical evidence for this research. In the practice of data sharing, trust plays a crucial role in promoting the formation of individuals’ willingness to share. It can alleviate scholars’ concerns about data being misused, plagiarized or improperly attributed, reduce the risk expectations of “free-riding” behavior, and thereby enhance the willingness to share (Kim, 2022). At the same time, self-efficacy also demonstrates significant advantages when dealing with complex data sharing tasks. People with high self-efficacy are more confident in handling technical challenges such as data cleaning, metadata annotation, and format standardization, reducing avoidance behaviors caused by “incompetence”(Mattern et al., 2024). In fact, the two core contradictions of “dare to share” and “can operate” addressed by trust theory and self-efficacy theory in data sharing-related research extend to the medical data sharing scenario. The sensitivity of medical data makes trust a prerequisite for sharing, while also increasing the rigor and complexity of sharing technology operations. The technical threshold leads to differences in self-efficacy. UTAUT originally focused on technology acceptance, but medical data sharing also involves data sensitivity and the rigor of ethical review, and must be carried out through a data sharing technology platform. Therefore, the UTAUT model is integrated with trust theory and self-efficacy theory. This promotes the transformation of sharing strategies from technical convenience to a two-dimensional framework of “risk-ability”. On this basis, in accordance with the characteristics of medical data and its users, data transparency and individual innovation are introduced to jointly explore how internal and external drivers affect sharing behavior by influencing the sharing willingness of medical data users, and whether internal drivers can directly affect sharing behavior. In addition, in order to analyze the differences in medical data sharing among different user groups, variables such as years of scientific research work, professional identity, data level needs, and sharing experience are added. The conceptual model of medical data users’ sharing behavior is shown in Figure 2.

4

Methodology

4.1

Measure

This study uses questionnaire survey to collect sample data. Part of the questionnaire refers to the classical literature in the field of data sharing, and the experts in the field are consulted. Before the formal investigation, the research team carried out pre-investigation in a medical group of a university and a hospital department, and recovered 65 valid questionnaires. In view of the questions and suggestions, the team visited some subjects and modified and improved the questionnaire repeatedly combined with group discussion, and finally formed a formal questionnaire.

The formal questionnaire is divided into three parts: the first part is the description of the questionnaire, providing background information; The second part is the basic information of the respondents, including gender, age, education, occupation, working years, etc. The third part is the core content, that is, the scale of influencing factors of medical data sharing around 9 latent variables, with a total of 36 items, as shown in Table 1. The 5-level Likert scale was used for measurement, and the respondents were invited to rate each item from 1 (“ strongly disagree “) to 5 (“ strongly agree “) according to their actual situation.

Table 1.

Scales and supporting literature

Latent variable	Indicator number	Observation index	Reference source
Performance expectation	PE1	It can enable you to obtain rich data resources	Venkatesh(2003) Yang(2010)
	PE2	It can enhance the efficiency of your clinical research
	PE3	It can promptly meet your data requirements
	PE4	It is helpful for academic research institutions to cultivate talents
Effort expectation	EE1	You think the process of medical data sharing is easy to understand and operate
	EE2	You can skillfully share medical data
	EE3	You can obtain the data you need very conveniently
	EE4	The process of applying to the ethics committee for approval and submitting a research plan is relatively simple
Trust	TR1	Medical data itself has academic research value	Chang(2011)
	TR2	The reliability of medical data sharing channels (platforms)
	TR3	The medical data obtained is complete and true
	TR4	The controller will sincerely share medical data
Community influence	CI1	Other clinical researchers share medical data	Venkatesh(2003)
	CI2	Seeing a friend using shared data and recommending it to you
	CI3	The research team suggests that you obtain and use shared data for clinical research
	CI4	Academic research institutions support you in sharing medical data
Data transparency	DT1	Doctors and other authorized personnel can access the data	Yang(2014) O’Malley et al.(2009)
	DT2	Data can be used for various purposes such as research and medical decisionmaking
	DT3	The methods for obtaining and processing medical data are open and transparent
	DT4	Medical institutions and professionals are responsible for the accuracy and security of the data
Individual innovation	II1	You enjoy trying new ways to carry out academic practice	Agarwal(1998) Goldsmith(2002) Yen(2020)
	II2	You are willing to try to obtain the required data through medical data sharing at the first time
	II3	You were among the first in the research team to share medical data
	II4	You often use shared data for clinical research
Self efficacy	SE1	You believe that you can obtain the data you want	He & Chang(2014)
	SE2	You believe that through your efforts, you can achieve medical data sharing
	SE3	When sharing medical data, you have multiple ways to overcome difficulties
	SE4	You will feel confident in participating in the sharing of medical data
Sharing intention	SI1	You are willing to participate in the medical data sharing activities	Bock(2005) Taylor(1995)
	SI2	You are willing to use data generated by other medical institutions in accordance with ethical and regulatory requirements
	SI3	You are willing to pay for high-quality medical data
	SI4	You are willing to promote the importance of medical data sharing to your colleagues and classmates around you
Sharing behavior	SB1	You often participate in medical data sharing activities
	SB2	You regularly browse the data on the medical data sharing platform
	SB3	You often use the data on the medical data sharing platform
	SB4	You participate in training or meetings related to data sharing

4.2

Data collection

In this study, the snowball sampling method was used to generate questionnaire links through questionnaire stars and send them to clinical research team members in universities, research institutes and medical institutions, and ask them to recommend peers or colleagues who meet the research conditions, so as to spread questionnaire links. At the same time, paper questionnaires are distributed through offline field research and potential survey objects are recommended. Combining online and offline methods, the samples covered seven major regions in China, namely Central China, South China, Northeast China, Northwest China, North China, Southwest China and East China, enhancing the diversity of the samples and the universality of the research conclusions. In addition, the questionnaire was collected from December 27, 2024 to February 15, 2025, and 402 questionnaires were finally collected. After screening, incomplete questionnaires, high repetition rate of questions, short filling time and low understanding of medical data sharing were removed to improve the accuracy of the research results, and 360 valid questionnaires were finally obtained. The effective recovery rate was 89.6%.

4.3

Data analysis

Structural equation model can model the complex relationship between latent variables and measured variables (Narayanan, 2012). SPSS21.0 and AMOS23.0 software are used to evaluate and test the measured model and structural model by using maximum likelihood estimation method. Reliability and validity are important indexes for evaluating measurement models. The reliability reflects the reliability of the measurement tool, and the validity reflects the scientificity of the scale, including content validity and structure validity. Content validity refers to the fit between the content measured by the scale and the concept of the target. Based on the mature scale and multiple rounds of expert consultation, this study ensures that the scale covers the core dimension and guarantees the content validity. Structural validity is concerned with the consistency of the measurement item with the theoretical concept, which is reflected by aggregation validity and discriminative validity.

5

Results

5.1

Descriptive statistical

Through descriptive statistics of sample characteristics, this study reveals the distribution characteristics of the research objects in terms of demography, occupational identity and data sharing experience. In the survey of 360 clinical researchers, male accounted for 50.6%, female accounted for 49.4%, the sample is more balanced in gender distribution. The age distribution of samples is mainly under 40 years old, which reflects that the sample group is relatively young and in the early stage of scientific research career. In terms of educational background, the sample has a higher education level, among which the undergraduate degree or above accounts for 100%, and the total number of master’s and doctoral students accounts for 53.0%, which shows that the sample group has a strong scientific research ability and academic background. The institutions of the samples were mainly concentrated in the affiliated hospitals and institutions of higher learning, accounting for nearly 70%, indicating that these institutions are the main places for scientific research activities, and also reflect the characteristics of close integration of clinical and scientific research. The occupational identity distribution shows that the samples are mainly medical students, followed by clinicians and researchers, accounting for more than 90%, which further highlights the professionalism and practicality of the sample group in the field of clinical research. In terms of years of working in scientific research, 52.8% of the samples are new researchers with less than 5 years of working experience, 30.8% with 6-10 years and 12.2% with 11-15 years of working experience, respectively, while less than 5% are those with more than 16 years of working experience. This distribution may be related to the relatively high proportion of young people in the samples. At the same time, it also suggests that there are some differences in the research experience of sample groups. In terms of data sharing demand, the sample group showed a high level of demand, among which the proportion of “very need” and “certain need” reached 89.2%, indicating that data sharing has important practical significance in current clinical research practice. In addition, 60.8% of the samples had experience in data sharing, indicating that data sharing has been practiced and recognized in the sample group to a certain extent, but there are still some people who lack relevant experience and their needs for data sharing have not been met.

In addition, the 36 items in the scale section were statistically analyzed to prove the validity of the scale data. The statistical results show that the minimum value of each item is 1 and the maximum value is 5, indicating that there are no outliers in the scale data and the data situation is normal. The mean values of each item are concentrated between 3.66 and 4.22, and the standard deviation ranges from 0.79 to 1.062, indicating that the degree of dispersion of the data is moderate, and there are certain individual differences in the scores of the respondents. In terms of normality, when the skew-degree value range is between -2 and 2 and the kurtosis value range is between -7 and 7, the data can be accepted as normal (Kline, 2016). The skew values of the sample data range from -1.242 to -0.457, and the kurtosis values range from -0.477 to 2.358, which can be regarded as conforming to the normal distribution.

5.2

Reliability and validity analysis

Cronbach’s α coefficient is often used to test the reliability of the scale. The value of this coefficient ranges from 0 to 1. When the α coefficient is between 0.7 and 0.8, the reliability of the scale is considered to be high. When the α coefficient reaches 0.9 and above, the reliability is very high. In this study, SPSS 21.0 software was used to test the reliability of the scale. The Cronbach’s α coefficient of each latent variable ranged from 0.754 to 0.830, and the overall Cronbach’s α coefficient of the scale was 0.915, indicating that the scale had a high overall reliability and could stably measure target variables.

Before factor analysis, KMO and Bartlett sphericity tests are required to confirm that the data is suitable for factor analysis and ensure the scientific results. In the results of KMO and Bartlett sphericity test, the KMO value was 0.875, close to 1, indicating a strong partial correlation between variables, and the significance level of Bartlett sphericity test (sig.) was far less than 0.05, indicating that there was a significant correlation between variables, which was suitable for factor analysis.

Additionally, the extracted common factor variance of each observed variable is greater than 0.5, indicating that the extracted common factor has a good representation and explanation rate for the latent variable. On this basis, the factor component matrix after rotation is analyzed. According to the clustering situation of factors and the content of measurement items, the three factors DT1, II2 and SI2 are eliminated, and the clustering relationship of the remaining factors is basically clear. The principal component analysis method was used to extract factors with eigenvalues greater than 1, and the maximum variance rotation method was used to converge after 7 iterations. A total of 9 factors were obtained, corresponding to 9 variables respectively, and the accumulation of factors explained 65.692% of the variance variation.

The aggregate validity was evaluated by standardized factor load, mean variance extraction (AVE) and combined reliability (CR) in confirmatory factor analysis. Among them, AVE greater than 0.36 is acceptable and greater than 0.5 is ideal(Kline, 1998; Yan et al., 2023). The confirmatory factor analysis results in Table 2 show that the standardized factor loadings for each latent variable are above 0.6, indicating that the scale’s measurement items effectively reflect the corresponding latent variables. Additionally, all AVE values exceed 0.36, with some nearing or surpassing 0.5, further demonstrating the scale’s strong convergent validity and its ability to capture the theoretical essence of each latent variable. The composite reliability (CR) values, all above 0.7, also confirm the scale’s internal consistency.

Table 2.

Confirmatory factor analysis results.

Constructs	Items	Factor loadings	AVE	CR
Performance expectation	PE4	0.654	0.445	0.761
	PE3	0.61
	PE2	0.662
	PE1	0.736
Effort expectation	EE4	0.715	0.513	0.809
	EE3	0.761
	EE2	0.676
	EE1	0.713
Trust	TR4	0.704	0.495	0.796
	TR3	0.737
	TR2	0.685
	TR1	0.686
Community impact	CI4	0.78	0.507	0.803
	CI3	0.763
	CI2	0.614
	CI1	0.679
Data transparency	DT4	0.769	0.496	0.745
	DT3	0.72
	DT2	0.614
Individual innovation	II1	0.712	0.479	0.734
	II3	0.664
	II4	0.699
Self efficacy	SE1	0.787	0.548	0.829
	SE2	0.759
	SE3	0.71
	SE4	0.701
Sharing intention	SI1	0.736	0.477	0.731
	SI3	0.611
	SI4	0.719
Sharing behavior	SB1	0.744	0.550	0.830
	SB2	0.728
	SB3	0.755
	SB4	0.738

Discriminative validity was assessed by comparing the correlation coefficients between each latent variable with the square root of their respective AVE values, as shown in Table 3. All correlation coefficients were lower than the square roots of their corresponding AVE values, confirming that the scale effectively distinguishes different concepts. For instance, the correlation coefficient between performance expectation (PE) and effort expectation (EE) was 0.301, significantly lower than the square root of PE’s AVE value (0.667). This finding further supports the scale’s structural validity, demonstrating its ability to accurately differentiate latent variables and meet the research’s scientific and rigorous standards.

Table 3.

Differential validity test results.

Constructs	PE	EE	TR	CI	DT	II	SE	SI	SB
Performance expectation	0.667
Effort expectation	0.301	0.716
Trust	0.52	0.525	0.704
Community impact	0.519	0.289	0.459	0.712
Data transparency	0.497	0.216	0.462	0.399	0.704
Individual innovation	0.376	0.506	0.39	0.41	0.357	0.692
Self efficacy	0.372	0.354	0.366	0.427	0.359	0.354	0.740
Sharing intention	0.412	0.398	0.37	0.377	0.359	0.525	0.413	0.691
Sharing behavior	0.438	0.542	0.418	0.473	0.341	0.458	0.459	0.447	0.742

5.3

Model test

The model fit index χ²/df was 1.838, which was lower than the standard value 3. NFI, RFI, TLI and other indicators are higher than 0.8, IFI and CFI indicators are higher than 0.98 and RMSEA value is 0.046, lower than the standard value of 0.08. This shows that the overall fit of the model is good, and then the 9 hypotheses are verified and analyzed to determine the degree of correlation of the hypothesis paths. As shown in Table 4, except for hypothesis H3 (P=0.382>0.05), all the other hypotheses are valid and have significant positive effects.

Table 4.

Hypotheses test results.

Hypotheses	Path	β	t	P	Test results
H1	Performance expectation → Sharing intention	0.195	2.098	*	Support
H2	Effort expectation → Sharing intention	0.336	4.145	***	Support
H3	Trust → Sharing intention	-0.086	-0.874	0.382	nonsupport
H4	Community Impact → Sharing intention	0.175	2.171	*	Support
H5	Data Transparency → Sharing intention	0.181	2.179	*	Support
H6	Trust → Sharing behavior	0.199	2.935	**	Support
H7	Sharing intention → Sharing behavior	0.199	2.892	**	Support
H8	Individual innovation → Sharing behavior	0.249	3.457	***	Support
H9	Self efficacy → Sharing behavior	0.240	3.741	***	Support

*

Note: means significant at P < 0.05 level,

**

means significant at P < 0.01 level,

***

means significant at P < 0.001 level.

5.4

Difference analysis

To examine the differences in direct influencing factors of medical data sharing behaviors across survey subjects with varying genders, years of scientific research experience, institutions, professional identities, and data sharing experience, independent sample t-tests and one-way ANOVA were conducted. Significant results are presented in Tables 5 and 6.

Table 5.

Single factor ANOVA test results.

Variable	Category	Trust		Sharing intention		Individual innovation		Self efficacy		Sharing behavior
Variable	Category	Mean value	Standard deviation	Mean value	Standard deviation	Mean value	Standard deviation	Mean value	Standard deviation	Mean value	Standard deviation
Years of research work	Less than 5 years	3.836	0.815	4.005	0.829	3.754	0.793	3.812	0.839	3.691	0.857
	6-10 years	4.176	0.538	4.177	0.531	3.982	0.671	4	0.646	4.052	0.658
	11-15 years	4.034	0.702	4.174	0.422	4.061	0.656	4.08	0.833	4	0.605
	16-20 years	4.225	0.606	4.033	0.745	4.133	0.526	4.1	0.709	3.5	0.95
	More than 21 years	4	0.685	4.667	0.577	4.267	0.925	4	0.884	4.4	0.518
	F	4.323**		2.138		3.21*		1.802		5.481**
		1<2				1<2,3				1<2,3,5
	LSD post mortem	4<5								2>4
Professional identity	Medico	3.75	0.856	4.054	0.842	3.723	0.877	3.772	0.893	3.657	0.92
	Clinician	4.102	0.667	4.156	0.553	4.042	0.629	4.031	0.721	4	0.737
	Researcher	4.234	0.474	4.152	0.553	3.975	0.598	4.003	0.673	3.981	0.574
	F	10.223**		2.066		4.313**		2.782*		5.092**
	LSD post mortem	1<2,3		1<2,3				1<2,3		1<2,3
Data level requirement	Level 1	4.074	0.683	4.000	0.687	3.549	0.526	4.074	0.564	3.838	0.701
	Level 2	3.976	0.547	3.875	0.646	3.944	0.669	3.767	0.750	3.757	0.662
	Level 3	4.097	0.696	4.234	0.661	3.790	0.832	3.908	0.911	3.860	0.775
	Level 4	3.911	0.823	4.121	0.749	3.909	0.765	3.940	0.793	3.845	0.860
	Level 5	4.202	0.445	4.206	0.477	3.905	0.700	4.048	0.546	4.107	0.620
	F	1.352		2.608*		1.26		1.014		0.804
	LSD post mortem			2<3,4

*

Note: means significant at P < 0.05 level,

**

means significant at P < 0.01 level.

Table 6.

Differences in constructs of shared experiences.

Constructs	Yes(N=219)		No(N=141)		T	P
Constructs	Mean value	Standard deviation	Mean value	Standard deviation	T	P
Trust	4.142	0.584	3.723	0.859	5.076	<0.01
Sharing intention	4.139	0.522	4.012	0.919	1.489	>0.05
Individual innovation	3.997	0.603	3.697	0.898	3.487	<0.01
Self efficacy	4.027	0.649	3.736	0.932	3.242	<0.01
Sharing behavior	4.048	0.607	3.528	0.930	5.879	<0.01

6

Discussion

6.1

Influencing factors of medical data user data sharing

The H1 and H2 tests indicate that performance expectation (β=0.195, P<0.05) and effort expectation (β=0.336, P<0.001) have a significant positive impact on the sharing intention of medical data users. This result is consistent with previous relevant research results (Ren et al., 2024). According to expectation theory, individuals weigh the expected utility and potential return of behavior in the decision-making process (Chen & Huang, 2011; Kolekofski & Heminger, 2003). In the clinical research scenario, medical data sharing behavior may be closely related to the output of scientific research results and the improvement of medical team collaboration efficiency. Therefore, when users believe that data sharing can significantly meet their data needs and effectively improve the efficiency of clinical research, this positive expectation will significantly enhance their willingness to share. Compared with performance expectation, effort expectation has a more significant impact on users’ willingness to share, indicating that the ease of use and operation of the medical data sharing platform occupy a higher position in the decision-making of clinical researchers. This phenomenon reflects that when clinical researchers use medical data, they not only pay attention to the quality and potential value of data, but also pay more attention to the convenience and efficiency of obtaining and using data.

H3 failed the test, while H6 did. The results show that trust has no significant effect on the sharing intention of medical data users, but it has a significant positive effect on the sharing behavior. Further analysis reveals that the differentiated impact of trust on “willingness to share” and “sharing behavior” can be explained by the “willing-behavior gap” theory. That is, in the process of medical data sharing, although medical data users have the willingness to share data, they delay or give up specific sharing actions due to risks such as platform security and data authenticity. At this time, trust becomes the key bridge to cross this gap. During the stage of willingness formation, medical data users are mainly driven by rational cognitive assessment. Individuals form the willingness to share by weighing the benefits and costs of behavior, while trust has no significant impact on the willingness of medical data users to share. The specific reasons can be divided into three aspects: First, the compensatory effect of professional motivation, that is, the pursuit of data demand and scientific research improvement by researchers can compensate for the risk of trust deficiency; Second, the weak risk perception scenario, that is, the stage of intention formation has not yet involved specific data sharing operations, and the risk assessment of data sensitivity by medical data users has not been activated. Thirdly, the safeguarding role of institutional norms, that is, organizational support and the transparency of data sources indirectly weaken the direct impact of trust by reducing decision-making uncertainty. When medical data users enter the behavioral execution stage and face the real practice of medical data sharing, trust can significantly reduce their concerns about data authenticity and platform security, and directly affect the specific implementation of their sharing behavior. On the one hand, medical data users tend to choose data providers, platforms or channels they trust for sharing. On the other hand, trust helps medical data users understand, adapt to and manage restrictions such as data access rights and usage agreements more effectively within the established policy framework, reduces perceived risks during the execution process, and thereby promotes the transformation of sharing intentions into safe and effective behaviors.

H4 and H5 showed that community influence (β=0.175, P<0.05) and data transparency (β=0.181, P<0.05) had a significant positive impact on the sharing intention of medical data users. Relevant studies show that based on the UTAUT model, community influence has a significant positive impact on college students’ intention to continue using online teaching (Qian, 2022), which is consistent with the results of this study. Through a series of experiments, Tajfel (1982) proposed the concept of “social identity”, which means that an individual recognizes that he belongs to a particular social group and the emotional and value meaning that membership in that group brings to him. The core is that individuals build their self-image through group belonging and obtain emotional satisfaction and value recognition from such belonging. Users perceive data sharing as an approved or expected behavior through interaction with the community, thus enhancing their willingness to share. The significance of data transparency shows that when users can clearly understand the process, purpose and use of data sharing, their willingness to share will be significantly increased. The guidelines for promoting Transparency and Openness include the dimension of data transparency, and confirm that there is a significant positive correlation between data transparency and the total citation frequency of journals (Liang et al., 2020). The enhancement of transparency reduces users’ uncertainty and risk perception of data sharing, and thus makes users more willing to participate in data sharing activities.

H7, H8 and H9 showed that sharing intention (β=0.199, P<0.01), individual innovation (β=0.249, P<0.001) and self-efficacy (β=0.240, P<0.001) had significant positive effects on sharing behavior of medical data users. The significance of sharing intention further verifies the positive correlation between willingness and behavior, that is, higher sharing intention can significantly promote the occurrence of sharing behavior, which is consistent with previous relevant studies(Zhi et al., 2021). The significance of individual innovation indicates that medical data users with higher innovation are more inclined to participate in data sharing behavior. This conclusion confirms the view that users with higher individual innovation are more inclined to actively try and adopt new technologies (Rogers, 2003). This finding shows that innovation not only affects individuals’ acceptance of new technologies, but also directly affects their behavior in actual scenarios, providing an important psychological and behavioral basis for promoting medical data sharing. The significance of self-efficacy indicates that users’ confidence in their own ability has an important impact on their sharing behavior, and relevant studies have also confirmed the positive impact of self-efficacy on user behavior (Zhao, 2011). This positive effect stems from their confidence in their own abilities, their ability to cope with difficulties, and their psychological confidence in participating in the sharing process.

6.2

Individual differences in influencing factors of medical data sharing

There are significant differences in the influence of working years on trust, individual innovation and sharing behavior. Univariate ANOVA analysis showed that the group who had been engaged in scientific research for 6-10 years was significantly higher in the dimensions of trust and individual innovation than the group who had been engaged in scientific research for less than 5 years, and they were more active in the practice of medical data sharing. In terms of individual innovation, those who have been engaged in scientific research for 11-15 years are higher than those who have been engaged in scientific research for 6-10 years, indicating that with the increase of scientific research years, clinical researchers’ innovative consciousness in medical data sharing practice has significantly improved. Trust (F=4.323, p<0.01) and sharing behavior (F=5.481, p<0.01) were significantly different among the groups with different years of research work. Clinical researchers who have been engaged in scientific research for a long time have accumulated rich experience through continuous practice and learning, and have a deeper understanding of the process, risks and benefits of data sharing. This helps to enhance their trust in data sharing platforms and data providers. As individuals’ roles in research teams evolve, especially with the acquisition of leadership roles, their awareness of the importance of data sharing is enhanced, which in turn stimulates their innovative consciousness and behaviors. Furthermore, individuals with longer years of scientific research experience have demonstrated greater adaptability and flexibility in data sharing policies after undergoing multiple policy and institutional changes, which further influences their sharing behavior. These factors work together to reveal the complex correlation between the years of scientific research work and data sharing behavior.

There were significant differences in the effects of occupational identity on trust (F=10.223, p<0.01), individual innovation (F=4.313, p<0.01), self-efficacy (F=2.782, p<0.05) and sharing behavior (F=5.092, p<0.01). Researchers have the highest mean value in the trust dimension because they rely on data sharing to drive research and show higher levels of trust. Medical students had lower mean scores on various indicators than clinicians and researchers, and were less engaged and motivated. LSD postmortem tests also confirmed significant differences between medical students and other occupational groups, indicating that occupational identity has a key influence on medical data sharing behaviors, and clinicians and researchers have a greater sense of innovation and belief in overcoming difficulties in sharing practices. The degree of reliance and professional demands on data sharing vary among different occupational groups, which leads to different understandings of the data sharing process, rules and potential value among each group. This results in significant differences in their trust and self-efficacy towards data sharing, as well as their innovative awareness and sharing behaviors. These factors work together to form behavioral patterns and psychological characteristics of different professional identities in medical data sharing, revealing the profound reasons for the differences in professional identities.

At present, there is no unified international medical data grading standard, and different countries and regions will develop corresponding grading systems according to their own conditions. For example, in the United States, the Health Insurance Portability and Accountability Act (HIPAA) determines the sensitivity and protection requirements of data, classifying data into three levels. China classifies medical data on a scale of one to five, based on its importance, level of risk and level of possible harm to the subjects. ANOVA analysis of the direct influence factors of data level on medical data sharing behavior showed that only sharing intention (F=2.608, p<0.05) had significant differences among different data level groups, while other variables such as trust, individual innovation and sharing behavior showed no significant differences. Specifically, Level 2 users of medical data had the lowest mean willingness to share, while level 3 and 4 users had a higher mean. This is due to the low sensitivity, limited scientific value and low willingness to share Level 2 data (general population and health service information); Level 3 (partially identifiable information) and Level 4 (complete health data) have high data sensitivity, great scientific research value, and high willingness to share. This indicates that data classification has a significant impact on sharing behavior, and future research and policies should balance data sensitivity and availability to promote safe sharing and effective use of medical data.

The results of independent sample T-test showed that trust (t=5.076, p<0.01), individual innovation (t=3.487, p<0.01), self-efficacy (t=3.242, p<0.01) and sharing behavior (t=5.879, p<0.01) were significantly different among clinical researchers with or without shared experience. The study found that the average value of clinical researchers with shared experience was significantly higher than that of those without shared experience, indicating that shared experience had a significant positive impact on medical data sharing behavior. This is because experienced people have a clearer understanding of the process, risks and benefits of data sharing, accumulate more skills and methods in practice, and therefore show higher trust, innovation and self-efficacy in data sharing, and participate more actively in sharing behavior. The accumulation of shared experience can effectively enhance the participation and enthusiasm of clinical researchers, and provide an important reference for optimizing the data sharing mechanism.

7

Conclusion

Given that medical data sharing behaviors may vary in different scenarios and among different groups. It is crucial to study data sharing behaviors and their internal and external motivations in clinical research scenarios from the perspective of medical data users. This study is the first to integrate the UTAUT model, trust theory, and self-efficacy theory into clinical research scenarios. We introduced the concepts of data transparency and individual innovation, constructed and validated the CRS-USB conceptual model, and expanded the UTAUT model. The validated CRS-USB model comprehensively considers the impact of internal and external factors on the sharing behavior of medical data users. The research results clearly demonstrate the mechanism of action of these factors. The research results are of great significance for stakeholders in medical data sharing to understand sharing behavior in clinical research scenarios.

This study provides new theoretical insights into the factors influencing the sharing behavior of medical data users in clinical research scenarios. Among the internal motivators, effort expectation has a greater effect on sharing intention than performance expectation, individual innovativeness and self-efficacy have a greater effect than trust, and trust has no significant effect on sharing intention. Among the external motivators, community influence and data transparency both positively influenced sharing intention. Individual difference analysis showed that trust, sharing intention, individual innovativeness, self-efficacy, and sharing behavior differed significantly across user groups. In addition, this study takes clinical research as a scenario to expand the understanding of sharing behavior of medical data users, breaking through the limitations of previous studies that mainly focus on data sharing between doctors and patients.

In practice, this study analyzed the internal and external motivations for medical data users to share data in clinical research scenarios. It assessed the differences between different user groups, including years of research experience, professional status, data classification requirements, and sharing experience. For example, users with 6-10 years of scientific research experience have a higher level of trust than those with less than 5 years of experience. Medical students are lower than clinicians and researchers in terms of participation, trust, innovation, and self-efficacy. In addition, data classification has a significant impact on sharing intentions and behaviors, and users with sharing experience are more likely to participate in sharing again.

The research conclusion reveals the internal and external driving factors and key roles of medical data sharing behavior in clinical research scenarios. In subsequent practical exploration, the following optimization paths can be implemented based on evidence: First, establish a finer classification and grading standard for medical data based on data transparency and demand differences. In response to the positive impact of data transparency on the willingness to share and the differences in data level demands among different groups, under the government’s leadership, the National Medical Data Center and the ethics Committee of tertiary hospitals can jointly formulate stratification norms for clinical research data, classifying medical data into three categories: basic data resources, business data resources, and thematic data resources. Clearly define the desensitization technical standards and permission allocation schemes for each type of data, and automatically adjust the protection intensity according to changes in the data environment. Based on the contribution of medical data in the process of scientific research application, the volume of high-value data sharing should be incorporated into the evaluation system for the application of key research and development plans to balance the utility of data and security requirements. Second, establish a flexible platform for data sharing, submission and usage. Given that the intensity of the impact of effort expectations on the willingness to share exceeds performance expectations, and there are significant technical adaptation barriers among the medical student group. It is necessary to reconstruct the interaction architecture of the data sharing system, adopt intelligent field mapping technology and modular operation interface to reduce the complexity of use, and develop a stepped task guidance program for medical students. Establish a platform performance diagnosis mechanism, regularly analyze the process breakpoints and timeconsuming bottlenecks in user operation logs, and thereby optimize key links such as data submission verification. Third, a dynamic mechanism based on community-driven and innovation-motivated approaches. Taking into account the promoting effect of community influence on sharing behavior and the key impact on individual innovation, it is suggested that the Chinese Medical Association take the lead in establishing a clinical research consortium and set up a special fund to support young scholars. Incorporate the innovative contributions of data models into the evaluation indicators of scientific research performance, stimulate innovation vitality through the effect of academic communities, and break through the incentive limitations of traditional institutional trust.

This study has some limitations. As samples from Chinese clinical researchers were used as empirical data, further research is needed to verify the universality of the research results. In addition, this study explores the factors affecting medical data sharing in clinical research scenarios from the user’s perspective. In order to more comprehensively investigate the gap in medical data sharing practices, in the future, we will analyze the driving forces behind medical data sharing in clinical research scenarios from multiple perspectives, including controllers, supervisors, and processors, and combine a multi-subject system analysis of the internal mechanisms of medical data sharing.

Język:: Angielski

Częstotliwość wydawania:: 4 razy w roku
Dziedziny czasopisma:: Informatyka, Technologia informacyjna, Zarządzenie projektami, Bazy danych i eksploracja danych

Kanał RSS czasopisma

Influencing factors and group differences in medical data sharing in clinical research scenarios

Meng Zhang

Dongmei Mu

Ping Wang

Kategoria artykułu: Research Papers

Data publikacji: 06 wrz 2025

Otrzymano: 02 kwi 2025

Przyjęty: 26 sie 2025

DOI: https://doi.org/10.2478/jdis-2025-0049

Słowa kluczoweClinical research scenarios, Medical data, Sharing behavior, Data user, UTAUT

© 2025 Meng Zhang et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
Clinical research scenarios, Medical data, Sharing behavior, Data user, UTAUT