Pseudonymisation of personal data as a technical and organisational measure for using data in the cloud

Modern technological developments have significantly increased the processing, storage and exchange of personal data. The General Data Protection Regulation (GDPR) and the Law on the Processing of Personal Data set out how to collect, store and process data, but there is relatively a small amount of information on how to store data securely in practice, for example, when a data controller uses the cloud for its information management system. The use of third-party hosting services to store and process personal data often involves the transfer to the cloud of information about the data subjects involved for whom personal data is stored, such as personal data of the data subjects concerning the creation of user profiles or other personally identifiable information. Based on the GDPR, the transfer of third parties' “personal data” to other parties is only allowed to a limited extent; thus, the data controller must ensure proportionate technical and organisational measures in the context of the data subject. However, the transfer of information about third parties to the cloud might be permissible if potential processors cannot access the information as such or cannot know the content of the information despite access. The storage and processing of personal data in the cloud will be discussed in more detail below, as well as whether pseudonymisation of personal data can effectively eliminate the reference to the data subject and thus allow unrestricted use of the cloud, in line with the requirements of data protection.

Analysing the storage of personal data from the data subject's point of view, considering the abstract concept of the cloud and the data subject's characterisation that physical control over the data is lost by placing it in the cloud, five main security threats can be highlighted: data exposure to unauthorised access, unauthorised access, data loss, manipulation, and privacy breaches. Exposure and unauthorised access affect the data confidentiality requirement; data loss and data manipulation violate the data integrity requirement; and privacy breaches violate the privacy preservation requirement; additionally, a privacy breach violates both the data confidentiality and integrity requirements (Degambur et al., 2022). This article will analyse how to ensure secure storage and processing of personal data in the cloud from a security perspective.

Research results

With regard to the task of examining the storage and processing of personal data in the cloud and analysing whether the pseudonymisation of personal data can effectively eliminate the reference to the data subject and thus enable the cloud to be used without restrictions as well as in accordance with the requirements of data protection, there will be used legal norms' interpretation methods in ways grammatical and historical, and the analyses of scientific articles will take place.

As the result of the scientific article, there must be an analysis of legal norms and the definition of the research question: “If personal data are stored and processed in the cloud, can the pseudonymisation of personal data effectively eliminate the reference to the data subject and thus enable the use of the cloud to be carried out without restrictions as well as in accordance with the requirements of General Data Protection Regulation (GDPR)?”

Personal data

The purpose of the law on the processing of personal data is

“to create legal preconditions for setting up of a system for the protection of personal data (hereinafter – the data) of a natural person at a national level by providing for the institutions necessary for such purpose, determining the competence and basic principles of operation thereof, as well as regulating operation of data protection officers and provisions of data processing and free movement”

(Law on the Processing of Personal Data, Article 2).

The concept of personal data is not separately defined in the context of Latvia, which means that personal data in Latvia can be considered as

“any information relating to an identified or identifiable natural person (‘data subject’)”

(Regulation (EU) 2016/679, Article 4(1)) based on the GDPR. From the above, it can be concluded that if information cannot be attributed to an identifiable person or data subject, it is not considered as personal data. Based on Article 4(1) of the GDPR, a person is identifiable if he or she can be

“identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”

(Regulation (EU) 2016/679, Article 4(1)).

Thus, the above leads to the conclusion that when data are modified, for example by deleting an identifier, the reference to a person may no longer exist and they may not be considered personal data if the data subjects can no longer be identified after the identifier has been removed. Even though the erasure of identifiers may be considered a form of identification, the removal of personal identifiers of data subjects may also be possible using so-called “additional knowledge” (Simitis, 2011).

Additional knowledge and ways of reference checking

In assessing whether or not personal data constitute identifiable information, the provisions of Recital 26 of the GDPR allow us to conclude that it is not necessary to take into account all the additional knowledge and means theoretically available to verify the reference of personal data to an identifiable person, but only all objective factors such as the cost and time required for identification, taking into account the technology and technological developments available at the time of processing [Regulation (EU) 2016/679, Recital 26]. It is, therefore, possible to conclude that a reference to a data subject is already excluded if the likelihood of identification is so low that the risk of identification is practically impossible. Thus, identifiability should not be assessed based on theoretical criteria, but based on actual probability, since in theory, with the use of appropriate technologies, matches and individuals, personal data will almost always be attributable to a person, but whether a particular person will be identified with the excessive effort and resources involved will remain a matter of probability (Roßnagel, 2003).

Based on Article 32 of the GDPR, it is possible to assess the security of processing,

“Taking into account the state of the art, the costs of implementation and the nature, scope, context, and purposes of the processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor, shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk (…)”

(Regulation (EU) 2016/679, Article 32); the security check on processing is thus carried out based on a trade-off between extreme and average values, which must be proportionate on the part of the controller. The time, cost and effort involved must be considered and weighed against the potential risk. The contribution of this effort can usually be measured in terms of the value of the personal data, for example by quantifying the economic benefit that can be derived from the data in a given database (Gola & Schomerus, 2012). The evaluation of the effort invested in the technical and organisational measures for scoping should also take into account whether the personal data are static or dynamic, since, for example, the amount of personal data that may be identifiable with a particular person may increase over time; for example, user profile information, which initially, necessarily, consists only of name and email, subject to data protection by design and data protection by default [Regulation (EU) 2016/679, Article 25], may be scaled up to such, which, at the data subject's choice, may in the future be supplemented by a personal image, a home address, a workplace, personally identifiable documents and other information; the data controller must keep up to date when establishing technical and organisational measures, with due consideration being paid to the possibility that, as the volume of data increases, data currently classified as anonymised information may in the future become personal data through the attachment of additional identifiers processing (Simitis et al., 2019).

Identifier as a reference to the data subject

While the GDPR requires that proportionate measures be taken when establishing technical and security measures for the processing of the personal data in question, the question may arise whether the assessment of what additional knowledge can still be obtained with reasonable effort should be made in relative or factual terms. Based on the leading view, a relative reference to the data subject is only knowledge about the specific person (Hansen & Meisner, 2007). Among other things, if a reference can identify, for example, to which group of employees of a company a particular data subject belongs, this does not necessarily allow the immediate identification of the data subject in question – rather, it depends on the context, that is to say on the relevant resources and the existing additional knowledge available to identify the person. A contrary view may be justified by reference to Recital 26 of the GDPR, according to which any information that may be used in addition by a third party must also be considered when determining the data protection principles (Pahlen-Brandt, 2008).

Recital 26 of the GDPR does not specify to what extent information from other sources must be analysed, whether this information must be linked to the controller to any extent, and what the source of the information is. However, the risk of data transfer does not necessarily entail the data protection risks feared by critics of the relational approach (Pahlen-Brandt, 2008). Even if one party has some additional information that would allow indirect identification of an individual through the identifiers provided, that party wishing to delete identifiers on data subjects should not assume that the other party will not have access to similar information and will not be able to identify data subjects. Further to the relative reference to persons, the possibility of further transmission of “personal data” to third parties in anonymous form should also be considered, which, if all identifiers are not properly hidden, may significantly broaden the sources of information and allow for the identification of data subjects. If, for example, anonymised data is published on the internet by one of the parties involved, the likelihood of identifying a person based on partially anonymised information increases, and thus, any additional knowledge available on the internet that could form a reference to a specific data subject should be considered when anonymisation is carried out (Caspar, 2009).

Anonymisation and pseudonymisation

Based on data protection by design and data protection by default, several technical and organisational measures are in place (Regulation (EU) 2016/679, Article 25). Some examples of technical and organisational measures include pseudonymisation and anonymisation of personal data. Anonymisation and pseudonymisation are methods of removing identifying features from personal data, the difference being that anonymisation is an irreversible process, while pseudonymisation, with access to a cryptographic key, allows personal data to be returned to their original state.

Personal data may be considered anonymous if the content of the personal data can no longer be linked to specific individuals or can be linked to specific individuals only with disproportionate effort (time, cost, manpower). For anonymisation to be considered a safe and effective method, identifiers must be erased or changed in such a way that they can no longer be identified. Anonymous data simply lack the ability to be linked to specific data subjects, either from the outset or because the identifiers were removed at some point during anonymisation (Roßnagel & Scholz, 2000). Nevertheless, with the availability of online resources and the large amount of personal data stored on different digital platforms, anonymisation can be considered a proven method to meet current data protection requirements, even with the widely available data analytics requirements (Weichert, 2013). Anonymisation permanently removes the reference to the data subject and data protection law is not applicable to further processing (Simitis, 2011; Ievina, 2022).

Pseudonymisation involves the alteration of personal data by means of a change of the personal data in such a way that the data subject can no longer be identified with the personal data after pseudonymisation without knowing or using the personal data encryption key. Given that the encryption key of the personal data is known only to the controller, the personal data may be restored to their original state at any time, if necessary. As pseudonymisation retains an implicit reference to the data subject, these data, unlike anonymised data, are subject to the requirements of the GDPR. On the other hand, if the controller who has effective control of the pseudonymised data no longer has access to the encryption key, the personal data is no longer any different from relatively anonymised data to which the requirements of the GDPR do not apply (Härting, 2013), thus making it necessary to look at the applicability of the data protection application requirements from different perspectives in the case of pseudonymised data.

Pseudonymisation of personal data in the cloud

If the data controller of the cloud has information about personal data that cannot be identified by others, or if the controller modifies the data so that it can no longer be linked by others to specific data subjects, the data may be considered pseudonymised by the controller but anonymised by any other third parties who do not know the algorithm by which the personal data were pseudonymised. The question arises whether these pseudonymised personal data can be stored and processed in the cloud without being subject to the provisions of the GDPR and the Personal Data Processing Law. The transfer of pseudonymised personal data to the cloud would be neutral from a data protection law perspective if it is possible to ensure that the encryption key is only accessible to the controller or that it is stored, for example, on another cloud or in a shared manner on other, multiple clouds and is not directly accessible to a potential attacker in the event of a personal data breach, because if personal data are pseudonymised and the maximum security requirements for the storage of the encryption key are ensured, the transmission of personal data in the cloud can be considered as neutral and compliant with data protection requirements, since in theory only encrypted data would be transmitted, processed and stored in the cloud. As mentioned above, neither the controller nor the cloud user can access the processed data; and even if it could be accessed during a cyber-attack, it cannot be returned as personal data in a readable format without knowing the encryption key. According to the results of the relative personal reference, pseudonymisation can be carried out under certain circumstances by means of legal prohibitions on processing and disclosure in the context of technical and organisational data protection measures. Therefore, it will be further examined whether and how the data controller can exclude the reference to the data subject in relation to other entities. For this purpose, cryptographic possibilities should be further explored, as previous studies have also concluded that the data subject's confidence in the protection of their personal data in the cloud could be strengthened by stronger confidentiality guarantees for their personal data. This can be achieved through data pseudonymisation, where several independent cloud hosting services are used, and secret sharing approaches are applied to the pseudonymised data before it is shared (Tatiana et al., 2016).

Process and types of pseudonymisation

The process of pseudonymisation consists of creating a key to convert the name into a ciphertext, to be able to continue using it without risking other identifiers such as email address or telephone number. Once the process is complete, decryption transforms the ciphertext into original text. Symmetric encryption processes use the same key for both encryption and decryption. In contrast, asymmetric encryption processes use different keys for encryption and decryption; one key is common to both processes (the symmetric key) and another key that is different is used only for decryption (the asymmetric key) (Varanda et al., 2021). Cryptography means

“mathematical methods and techniques that can be used to protect information against unauthorised disclosure or deliberate manipulation”

(IT-Grundschutz – BSI, 2011).

The basic idea of cryptographic pseudonymisation is the conversion of text into symbols or letters, according to a mathematical operation that a potential “attacker” cannot solve or can solve only with difficulty (IT-Grundschutz – BSI, 2011). For example, “brute force attacks” are carried out to crack cryptographically pseudonymised data. In this case, high-performance computers try to calculate the key to a preanonymised document by trying all possible key combinations and checking the reliability of the results (Geghards, 2010).

The security requirements for cryptographic pseudonymisation are mainly of a technical mathematical nature. In addition to choosing the right method, the pseudonymised result must be sufficiently long (IT-Grundschutz – BSI, 2011). Since the effort required to “crack” the pseudonymisation of personal data increases with the length of the ciphertext, the choice of an appropriate key length should take into account the foreseeable effort of a potential attacker and his financial and time resources, as well as possible technical developments, in particular computing power (Opinion No. 4/2007 on the concept of “personal data”, 2007). In addition, “encryption key management”, that is to say a secure encryption key generator (IT-Grundschutz – BSI, 2011) and regular exchange of encryption keys as well as access authorisation and secure archiving and destruction of encryption keys (IT-Grundschutz – BSI, 2011), can be considered as an important part of the pseudonymisation process.

Reference to the data subject in the pseudonymisation process based on the GDPR

As mentioned above, if before pseudonymisation, it is possible to identify specific data subjects from personal data, after pseudonymisation it is not possible for anyone other than the controller who holds the encryption key. Unfortunately, the Personal Data Processing Law does not define pseudonymisation in the context of Latvia. Even though the term pseudonymisation is analysed in the context of the GDPR, that is to say by stating that

“pseudonymisation” means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person

(Regulation (EU) 2016/679, Article 4 (5)), then, given the processing advocated here without the possibility to link specific data to a specific data subject, the application of data protection law can be partially excluded by processing data in the cloud, if the personal data are pseudonymised and by assessing at which level access to personal data is assessed (Weichert, 2010). In this case, controllers may restore the content and thus the original content of the personal data, if necessary. Even though the controller can restore the original personal data, all other authorities can only gain access to the data subject's data through a “brute force attack”. Their detectability depends on the technical means used and the security of the pseudonymisation. Therefore, the encryption key acts as a separation function, as pseudonymisation of personal data in the cloud can pseudonymise the data from the controller's side, but anonymise it from everyone else's side (Opinion No 4/2007 on the concept of “personal data”, 2007). It can therefore be concluded that the transfer of personal data to the cloud may be possible without the transfer of personal data and may therefore not be relevant from a data protection perspective, provided that the controller with whom the encryption key is stored has stored it in the most secure way possible and access to it by third parties is not possible, and where pseudonymisation succeeds in excluding the identification of data subjects by attackers or makes identification so difficult that it is only possible with disproportionate effort. In this context, the cloud becomes a kind of “black box” (Heidrich & Wegener, 2010), where, although the data is stored, it is in a format that is not identifiable to anyone, without additional database knowledge and access.

Pseudonymisation risk analysis

Pseudonymisation in the cloud context can only ever guarantee a temporary exclusion of personal data references, as the performance of IT systems is constantly improving, and technical developments increase the likelihood that a cyber-attack already underway may become successful if the technical requirements for pseudonymisation are not improved and strengthened over time, as the risk of a successful cryptanalysis increases over time. At this point in time, secure de-anonymisation methods, as well as existing secure pseudonymisation methods, can possibly be cracked after a certain period with reasonable effort as technology develops, but in the event of a cracking of the cryptographic key, the personal reference would be restored and the transmission and access to personal data by third parties would constitute a data breach (Roßnagel & Scholz, 2000). There is currently no concrete solution with regard to how the risk of possible de-anonymisation could be made completely secure in the future and what requirements should be set for “secure” pseudonymisation.

If pseudonymised personal data are decrypted and not further protected, they may become practically uncontrollable in the online environment and may create a significant burden for the data subject to whom they may be linked from the data subject's right to informational self-determination and cause incalculable harm even in the distant future. As the development of de-anonymisation technologies and cryptanalysis – in particular the advances in IT computing power – are difficult to quantify, and unforeseeable technical leaps in technological development occur regularly, it is important for the controller to be aware that in the event of a data breach, the cloud user will always have to live with some residual risk associated with the de-anonymisation of personal data.

Risk precaution analysis can be used to assess the benefits and risks of pseudonymisation to determine the quality security requirements for pseudonymisation (Roßnagel & Scholz, 2000).

Where pseudonymisation is used as a technical and organisational measure, the assessment of the potential risk should consider current and foreseeable scientific and technological developments with regard to pseudonymisation, based on anticipated scientific and technological developments in the context that a reference to a person could be disclosed in the foreseeable future (Roßnagel & Scholz, 2000). In assessing this risk of pseudonymisation, it is always necessary to analyse whether pseudonymisation based on the current state of the art and digital developments is sufficient to make the decryption of personal data impossible or disproportionate to the means currently available and what the evolution of the risk might be with technological developments in the foreseeable future.

Technically and legally secure pseudonymisation

Specific regulatory requirements for pseudonymisation requirements that can remove the personal reference to a specific data subject and for which there is a common approach or standards have not yet been developed. The fact that no uniform normative requirements have been developed either at the European Union (EU) or at the Latvian level creates considerable legal uncertainty for the controller. In view of future developments in the field of pseudonymisation, it currently remains unclear whether and for how long data processing is permissible due to the lack of reference to personal data. To address this paucity of clarity, it is suggested that future legal developments concerning pseudonymisation mandate appropriate standardisation to allow for the expression of concrete legal requirements, such as a specification that individual data are not personal data if the controller has pseudonymised them and is the sole key holder of the pseudonymised data, in addition to providing additional specifications such as the means to be used for data processing, the duration of data storage, the manner of data erasure and other requirements.

Looking ahead to the future of pseudonymisation, the question arises as to who should carry out the risk identification and risk assessment process and from there develop concrete, binding security specifications and requirements for pseudonymisation (i.e., e.g., minimum length of the encryption key or appropriate anti-anonymisation procedures depending on the category of personal data or other factors). One largely prevalent school of thought is that this standardisation process could be entrusted exclusively to the relevant experts, which would allow the user to keep abreast of scientific and technological developments and to modify the application of the requirements over time in the light of technological developments, as well as enable the formulation of guidelines with reference to legal provisions specifying specific anti-anonymisation procedures, storage times and other time limits, in line with the processing requirements

If such rules were developed, the controller could transmit pseudonymised personal data not containing any reference to specific data subjects without any apprehension concerning attempted unauthorised access or decryption, since the technology employed for the pseudonymisation would almost indubitably be the current state of the art in technology and science; accordingly, an assessment would have to be carried out to ascertain the quantum of resources requiring to be devoted to the development of the parameters needed to ideally characterise the pseudonymisation process.

Challenges with cloud backups

To prevent the disclosure of personal data references, the pseudonymisation of data should be carried out on a regular basis using the chosen pseudonymisation method, but this raises the problem that copies of previously pseudonymised data might have been made before the new data are pseudonymised and thus there is a risk that the processing might be inadmissible later, as the data could be depersonalised at a later stage. Thus, to be able to exclude the future updating of a personal reference, a means to prevent unnoticed or uncontrolled copying of data by third parties is necessary. If backup copies are necessary in the context of cloud computing, appropriate data management must ensure that all backup copies are deleted or re-anonymised at the time of re-pseudonymisation and not before or after, that is to say all activities in the process of pseudonymisation must be synchronised. A solution to the backup problem could also be a “digital expiry date”, a method that triggers automatic deletion of data at a certain point (Federrath et al., 2011).

Also, where, for example, personal data are stored on multiple cloud hosts to ensure a more effective compliance with security requirements, it should be ensured that, during the backup process, all these cloud hosts are running synchronously, with no retention, and the synchronisation of the data, their deletion and the generation of new backups are made, all at the same time.

Processing of pseudonymised data

On the one hand, pseudonymised data destined for cloud storage are protected by end-to-end pre-anonymisation throughout their storage lifetime (Arning et al., 2006). On the other hand, the targeted alteration of the content of pseudonymised data is currently technically, at least in practice, not yet possible (Bedner, 2013). If data are to be modified or processed in the cloud, the controller must communicate a key to the cloud service provider before the data are processed, or the data themselves must be pseudonymised to fully comply with the requirements of data protection law (Heidrich & Wegener, 2010). One technical solution that could be considered for pseudonymisation in the future could be to prevent data in a secure processing unit in the cloud from being accessed and even communicated to the cloud service provider, so that personal data can be processed there in a nonpseudonymised way without revealing the reference to the data subject (Taeger, 2013).

Conclusion

Personal data may be pseudonymised using secure pseudonymisation with cryptographic elements, provided that an appropriate forward-looking risk analysis sufficiently excludes disclosure of the algorithm of the pseudonymised data, no undetected copies of the data are available and the key of the pseudonymisation cipher remains secure and accessible only to the controller. For this purpose, there would be necessary appropriate standardisation for controllers to identify the security perspective of pseudonymised data, that is to say the requirements of the key holder concerning the means to process the data, the duration for which they are needed to be stored, the means of erasing them and other requirements. The fact that no uniform normative requirements have been developed either at the EU or at the Latvian level creates considerable legal uncertainty.

In view of the future developments in the field of pseudonymisation, it currently remains unclear whether and for how long data processing is permissible due to the lack of reference to personal data. Whether the pseudonymisation method and the associated key length are sufficiently secure depends on the risk analysis, which considers the state of the art, that is to say the state of the current progress in science and technology, regarding the threat of de-anonymisation. Guidelines or some other guiding document should be established in the jurisdictions of the EU to allow for the regular establishment and updating of pseudonymisation requirements, thereby ensuring legal certainty.

eISSN:: 2256-0548
Idioma:: Inglés

Calendario de la edición:: 3 veces al año
Temas de la revista:: Law, International Law, Foreign Law, Comparative Law, other, Public Law, Criminal Law

RSS Feed de revista

Pseudonymisation of personal data as a technical and organisational measure for using data in the cloud

Article Category: Research Article

Publicado en línea: 31 may 2023

Páginas: 42 - 48

DOI: https://doi.org/10.25143/socr.25.2023.1.42-48

Palabras clavecloud, identifier, personal data, pseudonymisation, technical and organisational measures

© 2023 Žaklīna Ieviņa, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
cloud, identifier, personal data, pseudonymisation, technical and organisational measures