Zitieren

INTRODUCTION

Nowadays, technology is developing at an unprecedented pace. As a result of technological advancements, all social and market spheres are being impacted, starting with how individuals communicate with each other and government institutions. Communication methods are transforming from in-person and phone conversations to social media, chatbots and automated registration systems. Technological improvements are also being introduced in various market segments, such as automated robots in warehouses and factories that move goods, translation systems that translate texts, systems that generate new drugs and many others. New technological solutions automate many areas, increasing work speed, efficiency and precision. However, these solutions also create new challenges for society as a whole, such as the need to retrain representatives of certain areas due to the threat of unemployment from integrating automated devices. This raises questions about the determination of fixed passive income for society (Universal Basic Income) and the regulatory context of data subjects' rights, specifically pertaining to the cases in relation to which a data subject has the right to allow or prohibit the use of their data and whether the data subject has property rights over their data or data created based on it.

Regarding medical care, it should be noted that patient care has made significant leaps forward in recent years. Several factors have contributed to this development, including the COVID-19 pandemic, which accelerated the integration of various healthcare solutions, such as telemedicine, remote patient monitoring mechanisms, digital therapy solutions, the use of digital devices in preventive treatment, and artificial intelligence (AI) technologies in medicine.

Considering the rapid development of so many different technologies in a single field in such a short period, it is only reasonable to analyse the compliance of new technologies with existing regulations according to the ex ante principle as quickly and accurately as possible to identify potential violations of rights and address any shortcomings.

This paper will examine the legal aspect of using data to train AI in the medical field and seek an answer to whether a data subject has the right to compensation for using their data in training AI.

The paper will address questions such as:

What are data, and can data be considered property?

How are data used to train AI systems?

In what cases are a patient's rights to manage their data limited?

Are data used to train AI subject to property rights or compensation for their use?

RESEARCH RESULTS AND DISCUSSION
Data ownership

The word ‘data’ is a Latin plural noun that derives from the word ‘datum’, which means ‘to venture to give’ or ‘something given’ (Vulis, 2020). There are many types of data, but primarily they are characters and symbols whose meaning can vary depending on the context. For example, 22222222 can be a phone number or an invoice number, among other possible explanations for this combination of digits.

In the context of the European Union, personal data are rarely associated with objects of property rights. The general definition of personal data is related to information that can identify or be related to an identifiable physical person and a set of data that can identify a particular person.

The legal dictionary explains personal data as any information about an identifiable physical person (Amona, 2004: 238). The General Data Protection Regulation (GDPR) does not precisely enumerate all personal data relating to a physical person. However, conventionally personal data are associated with a name, surname, home address, personal identification number, data stored in medical institutions, and other data.

According to Article 15(1) of the GDPR, the data subject has the right to obtain confirmation from the controller as to whether or not personal data concerning him or her are being processed and, where that is the case, access to the relevant data (Regulation (EU) 2016/679). In other words, if an organisation processes specific personal data, the person has the right to request access to his or her data and receive the processed data in the chosen format. An open question is whether the data stored in the cloud or on a computer can be considered property or things in the civil law sense (Leimanis and Palkova, 2021).

The term ‘property’ means anything to which a person has rights or interests, and it typically has a variable value. The owner's property is generally expressed as active, corpus, real property or equity capital (Hopkins, 2015: 331). Article 841 of the Civil Law of the Republic of Latvia states that things are corporeal or incorporeal. Incorporeal things are various personal, property and obligation rights to the extent that they are property components (Civillikums, 1992). Given the technological aspect, relating personal data to corporeal property would be challenging, but precedents could be established for the incorporeal property. For example, incorporeal property also includes obligation rights, where the physical aspect is not created but arises only from existence. This definition of personal data fits well with consumers and consumer rights advocates, who rely on the moral argument that personal data are closely linked to individuals, as it primarily arises from individuals (Will.i.am, 2019).

In the context of property, it is essential to determine whether personal data that a person has obtained from an organisation and personal data that are stored in a person's cloud can be considered property in the understanding of civil law. Similarly to organisations that have been able to anonymise collected data by removing personal identifiers that would link the data to a specific person, individuals should have the right to use their property rights over the data that they store in the cloud (Jurcys et al., 2020).

From a civil law perspective, data obtained from an organisation and stored in a person's cloud are considered a new object (Jurcys et al., 2020). For example, data requested in the United States are sent in JSON format. While JSON is the expected data format that is used for transferring data in the data science community, against the backdrop of the need to ensure that consumers can understand the data they have obtained from an organisation, there may be questions about whether this format would be considered suitable for GDPR requirements. One of the problems associated with personal data as property is the levels of personal data and the place where the data are stored. This aspect was one of the essential reasons scientists and practitioners had difficulty recognising property rights to something without clearly defined boundaries.

We cannot consider all personal data as property in civil law, as it is difficult to determine the beginning and end of personal data. However, personal data obtained from an organisation and stored in a person's cloud are considered property.

Big data and business entity interest

Big data refers to sets of data that are so large and complex that new technologies, such as machine learning (ML), are necessary to process them (Lielie dati: definīcija, priekšrocības, sarežìījumi, 2019). Big data can be obtained from various sources, which are later aggregated, such as uniform data on website statistics, application usage statistics and other data types.

The use of big data in healthcare is primarily associated with clinical data sets. Analysing large clinical data sets, such as anonymised patient medical histories or data input by patients in apps, can help improve the quality of diagnosis, treatment and drug development while reducing costs (Lielie dati: definīcija, priekšrocības, sarežìījumi, 2019).

Regarding commercial interests, before big data, businesses could only use a small number of data sets in analytics applications. Other data sets were often pushed aside as so-called ‘dark data’, which were processed and stored but not used further.

Businesses are increasingly using big data to help drive better business strategies. According to a survey conducted in 2021 by ‘NewVantage Partners’ with 94 large IT companies and business leaders, 91.7% increased their investments in big data projects and other data and ML initiatives (Stedman, 2022).

Big data has expanded the types of data analytics that businesses can use. Big data provides more significant opportunities for ML, predictive analysis, data mining, stream analysis, text mining, and other data science and progressive analytics disciplines. Using these disciplines, big data analytics applications help businesses understand the environment in which they operate, such as customer needs statistics. If big data analytics is successful, a business's commercial activity is promoted, resulting in increased revenue, which is the primary goal of commercial activity.

In today's competitive environment, big data is essential because it has become a fundamental element of strategy for businesses, providing significant advantages in performing commercial activities in various fields, including healthcare and training of ML systems.

The use of data obtained from data subjects in medicine when training AI

The term ‘artificial intelligence’ was introduced in 1956 by John McCarthy. AI is a subfield of computer science that deals with automating intelligent behaviour. AI also defines research on how to make computers do things that humans currently do better or on computational processes that allow for perception, reasoning and action. There is a belief that AI is the study of formal properties of problems and their solving methods and the part of computer science that studies symbolic, non-algorithmic reasoning processes and the representation of symbolic knowledge to provide intelligent behaviour in computer systems (Grundspeņķis, 2022).

There still needs to be a single definition of AI in the world. Different sources define AI differently. For example, AI is the ability of a system to correctly interpret external data, learn from such data and use this learning to achieve specific goals and tasks through flexible adaptation (Kaplana and Haenlein, 2019), or AI refers to systems developed by humans that operate in the physical or digital world, perceiving their environment, interpreting structured or unstructured data, reasoning based on acquired knowledge and choosing the best action (according to predetermined parameters) to achieve the goal. AI systems can also be designed to learn to adapt their behaviour by analysing how the environment influenced their previous actions (The European Commission, 2018).

According to the European Union's initiative ‘Proposal for a Regulation of the European Parliament and of the Council on a European approach for AI (Artificial Intelligence Act) and amending certain Union legislative acts’ of 29 November 2021, the latest proposed definition of AI is: AI is a system that (i) receives machine and/or human data and input, (ii) determines how to achieve a set of human-defined goals using learning, reasoning or modelling methods and approaches listed in Annexe I (see text below), and (iii) generates outputs in the form of content (generative AI systems), predictions, recommendations or decisions that interact with and impact the environment (Council of the European Union, 2021).

All proposed definitions are correct, but in the authors' view, the main difference is the range of devices that can be defined. Specifically, one definition states that an AI system can only be created by a human, whereas the other is not limited to human involvement in the creation process (Nguyen et al., 2023). One definition is more general, while the other is narrower, attempting to define the characteristics, functions and types of such systems more precisely. The definition cannot be narrowed, given that the development of AI systems is happening rapidly. It is impossible to predict what types of AI systems will be created and what functions they will perform. In the authors' view, the latest proposed version of the AI definition in the EU is broad enough to apply to all currently used systems that perform automated data processing and make conclusions or offer solutions based on it in the case of simpler systems. It is essential to note that the annexe to the proposal of 29 November 2021, by the EU, includes a list of specific methods that expand the explanation of the second point of the definition. The content of the annexe can be divided into three proposed approaches and associated methods:

ML approach, including supervised, unsupervised and reinforcement learning, using various methods, including deep learning;

Logic and knowledge-based approach, including knowledge representation, inductive (logical) programming, knowledge bases, inference and deductive engines, reasoning, and expert systems;

Statistical approaches, Bayesian inference, search, and optimisation methods.

Patient interests

The patient's main interest is to receive high-quality healthcare as quickly as possible, in a way that ensures the patient's health is protected from dangers, side effects or other potential negative consequences as much as possible.

To realise the patient's interests, AI can improve the healthcare process by providing faster healthcare, ensuring continuous prevention and improving the quality of services provided, whether in the operating room or in diagnosis. However, for AI systems to be integrated into the healthcare process, patient data are necessary so that AI can be ‘trained’ and make conclusions based on that data. This brings us to the question of the patient's other interests in managing their data, which encompass obtaining unambiguous knowledge concerning the identity and location of the entities processing and using their data, as well as the exact application involved in, and purposes motivating, such use.

In the context of data protection, data can be divided based on the purpose of their use. In this work, the authors will divide data into two segments: one is data obtained for scientific, historical research or statistical purposes, and the other is data obtained for all other purposes.

Regarding data processing, and susceptible data processing, it is necessary to consider the GDPR.

With regards to the purposes of the first segment, it should be noted that according to the GDPR preambles 33, 50, 52, 53, 62, 65, 113, 156, 157, 159, 161 and 162 and Articles 5, 9, 14, 17, 21 and 89, data collection and processing must be carried out according to ‘softer’ rules. If data are collected for these purposes, the GDPR reduces the scope of obligations that are imposed on the data controller and, to some extent, limits the data subject's rights to control their data. This implies that, for example, for scientific research purposes, the data controller has certain advantages over a commercial entity whose purpose is to collect data for its commercial activities. On the other hand, what happens when legitimately obtained data for project X are used for its intended purpose – training an AI system to pre-emptively and accurately identify people with cancer before it progresses and is diagnosed? After the development of such a system, it is sold to hospitals. In such a case, why cannot the individuals whose data were used to create this product claim ownership over their data as the owners of the fruit of their labour, which is the AI system's ‘knowledge’? Of course, given the enormous volume of data required for training such a system, and the vast number of data subjects whose data were used in the development process, it may seem that each data subject's contribution to creating such a system is minimal. However, regardless of this, in the authors' opinion, it is essential to recognise and respect each data subject's contribution.

The role of data in training AI

In today's digital age, data can be seen as the new oil of the economy that promote and are the basis for innovative technologies and creating modern solutions (Chazan, 2016). At the same time, technological devices offering modern solutions are based on acquired and processed data sets.

Different patient groups require different solutions that are based on different data sets. For example, AI algorithms developed to provide recommendations for appropriate pain medications will not be accurate and correct for patients suffering from different conditions, such as patients with migraines or pregnant women. Pain medicines for cancer patients have been numerous and varied. However, they have mainly been more robust than other alternatives enough to manage pain caused by other conditions with a less negative effect on the organism. Besides a patient's different ailments, other data—such as age, gender, ethnicity, race, sexual orientation, socio-economic status etc.—are essential to take into consideration when training an AI system (according to its functions) in order to ensure that the most precise and accurate result possible is obtained; these include sensitive data that provide invaluable information that is needed for use as a base on which to gather objective inferences, for instance, deductions concerning identification of the social groups that are more prone to certain diseases or ailments.

As Hal Wolf, President & CEO of the Health Information & Management Systems Society (HIMSS), notes, one of the main problems regarding AI system solutions is the need for standardised data processing approaches that significantly escalate project cost upon implementation difficulty (Southwick, 2022). Data processing in the AI context is meant as medical data integration into an AI system for ‘training’. At the same time, difficulties arise because ingested data from various sources come in various formats, meaning they need transformation before use within the AI ‘training’ process.

According to the information given in an interview by Emil Syundyukov, Chief Technology Officer and co-founder of Longenesis, for various projects planned for automation solutions, one of the main issues is data transferability, or when every project has to take into account the non-existence of data infrastructure in Latvia.

Other representatives of tech space found in public also point out the need for more standardisation regarding data transferability, which complicates scientific activity. This problematic situation is faced by scientists in Latvia and other countries such as the United States. A positive example can be taken from Finland and Belgium.

Finland has developed particular law on the secondary use of health and social data. The purpose of the law is to facilitate efficient and secure processing and access to social and health data for governance, monitoring, research and statistical purposes, as well as development needs in the health and social sector. The second purpose is to ensure legal certainty for the persons, including rights and freedoms, when processing personal data (Ministry of Social Affairs and Health of Finland, 2020). Secondary use of health data means that the data subject and registry will be used for another purpose instead of the primary intention for which they were stored initially.

On the other hand, Belgium has initiated Towards the Development of a National Health Data Platform (AHEAD) project to study possibilities of how existing Belgian health information system data could be integrated and ease usage for scientific use and valorisation. On a larger scale, the goal is to contribute towards the country's national health platform development.

For successful linking or integration of various existing digital collections, the following are necessary:

identification of data stored in the Belgian Health Information System;

mobilisation of data holders;

examination of possibilities to create sustainable and historical connections between these data sources; and

defining possible technical, legal and ethical weak points to move away from the current status quo (Sciensano, 2020).

Data are essential for training a MI system, and their quality and availability are vitally important. Therefore, it is essential to have an attractive environment at a national level to attract new scientists with a unified system of data transferability, which facilitates the process of obtaining data. At the same time, Latvia should provide clearer and fairer access to patient data and their rights to manage them fully. Inspired by the example of Finland and Belgium, the authors would recommend that Latvia develop a similar system where there is one responsible centralised agency managing all medical records for patients, where patients have unrestricted access to their data and available studies they can apply for, together with researchers having a more straightforward process and form for accessing the data needed for their research, which they are able to do by sending one request to one agency that then provides details of offer-criteria–conforming patients eligible for participation in the study; such a mechanism would allow researchers to take decisions pertaining to patient participation by themselves, as well as allow them to submit their data in a unified format. This system also has a commercial component attached, where patients receive either service or some financial benefit in return for their data, thus claiming ownership over those.

CONCLUSION

We cannot consider all personal data as property in a civil context since it is difficult to determine the beginning and end of personal data. However, personal data obtained from an organisation and stored within a person's device can be considered property. At the same time, by analogy, it can be concluded that there can also be property rights over data in other circumstances, depending on the data type and its storage place.

In today's competitive environment, big data has become very important as it has become an essential element of strategy for companies providing significant advantages in various areas such as healthcare and ML system training.

In the authors' view, the data subject may have a right to a proportional share of products created from their database – akin to property rights over the fruits or ‘knowledge’ generated through ML systems.

The authors suggest that Latvia should provide more transparent and more equitable access to patient data and establish a means by which patients could entirely control their rights over such data. Taking a cue from the examples of Finland and Belgium, they recommend setting up a similar system in Latvia where one responsible centralised agency oversees all patient medical records while simultaneously giving patients unrestricted access to their records and available studies they may apply for, thereby simplifying the process for scientists looking to access relevant datasets for their research, in such a way that it would be enough to send just one request to one agency, which would then shortlist suitable patients who will decide whether or not to volunteer in exchange for some service or economic benefit, thus claiming title over their dataset themselves.

eISSN:
2256-0548
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
3 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Rechtswissenschaften, Int. Recht, Auslands-, Völkerrecht, Rechtsvergleichung, andere, Öffentliches Recht, Strafrecht