What is Pseudonymization?
Pseudonymization is a method that allows you to switch the original data set (for example, e-mail or a name) with an alias or pseudonym. It is a reversible process that de-identifies data but allows the re-identification later on if necessary.
How is pseudonymization used in data protection?
Pseudonymization makes personal data processing easier, reducing the risk of exposing sensitive data to unauthorized personnel and employees.
For example, when sending excel sheets containing sensitive data via e-mail. Although the sender and receiver of the e-mails are authorized to access that information, your IT support also has access to those e-mails. Now imagine it was upper management bonuses or information about company wages.
When the data is pseudonymized, there is a lot less chance of exposing personal data, since it makes the data record unidentifiable while remaining suitable for data processing and data analysis.
What is a pseudonym?
In this context, a pseudonym is an identifier that is associated with an individual.
Just like writers use pseudonyms to conceal their identity and protect their privacy, pseudonyms are used for the same purpose in data protection.
A pseudonym can be a number, letter, special character, or any combination of those tied to a specific personal data or individual and, therefore, makes data safer to use in a business environment.
What does GDPR say about pseudonymization?
In Article 4(5) of the GDPR, the process of pseudonymization is defined as:
“the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”
If you are a DPO, you can see the appeal and benefits of pseudonymization. It makes data identifiable if needed, but inaccessible to unauthorized users and allows data processors and data controllers to lower the risk of a potential data breach and safeguard personal data.
GDPR requires you to take all appropriate technical and organizational measures to protect personal data, and pseudonymization can be an appropriate method of choice if you want to keep the data utility.
Is pseudonymized data still personal data according to the GDPR?
A pseudonym is still considered to be personal data according to the GDPR since the process is reversible, and with a proper key, you can identify the individual. Recital 26 explains:
“…data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person.”
Also, during the data breach, an encryption key can be exposed, putting at risk pseudonymized data as well.
Is anonymized data still considered personal data?
GDPR only concerns with the processing of personal data related to a natural person that allows the identification of an individual directly or indirectly.
If the data is anonymized so individuals can no longer be identified, GDPR simply doesn’t see it as personal data anymore. However, anonymizing data can often destroy the value that data holds for your organization.
Anonymization vs. Pseudonymization
Altough pseudonymization and anonymization are both used to protect the identity of the individual, they are not synonyms. The example below depicts (in a simple way) how both of those techniques actually work:
With pseudonymization, if you are authorized to access that information, you will have the key that will enable you to de-identify the data.
Anonymization is a technique that irreversibly alters data so an individual is no longer identifiable directly or indirectly.
Both methods are highly recommended. The choice will depend on many factors (the use case, degree of risk, the way data is processed within your company…). The best method for you will be determined by the purpose of processing, type of data you process, and the risk of a data breach it imposes.
Compared to anonymization, pseudonymization is a much more sophisticated option since it leaves you the key to “unlock” the data. This way, data is not considered directly identifying, and it is not anonymized either, so it doesn’t lose its original value.
Why should you opt for pseudonymization?
In the everyday operations of any business, a lot of sensitive data goes through HR, marketing, or IT departments, and pseudonymization can help you lower the risk and avoid any possible data breach. Recital 28:
“The application of pseudonymization to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations.”
Pseudonymization not only protects data but also supports the overall GDPR compliance of any organization.
Pseudonymization in practice
Pseudonymization can be achieved using various methods like data masking, encryption, or tokenization. It is commonly used as a technique to protect personal data on legacy production systems from unauthorized access where other security methods are inapplicable.
However, when implementing new IT systems, Organizations need to think about data protection by design and default (Article 25). And systems with data architectures utilizing pseudonymization can be very effective.
Another common use case for pseudonymization on production systems that are processing personal data is to temporarily store original values when anonymizing personal data for a fail-back mechanism.
In this case, pseudonyms can be stored for a short period of time, enough for the business to confirm anonymization has been completed successfully.
It is highly recommended to anonymize personal data on non-production environments, used for development, testing, and training purposes. Data sets with anonymized personal information are still great for development, statistics, and analytics.
2. Use pseudonymization on production systems
When designing data protection for live production systems, it is recommended to use pseudonymization. By doing so, only authorized users will have access to data subjects’ personal data. Once the lawful basis for processing data subject’s personal data no longer exists, the system will delete the pseudonym and make the data subject anonymized (forgotten).
3. Automate pseudonymization and anonymization
No matter the use case, both pseudonymization, and anonymization should be automated. So should data validations. Make sure to automate your processes as much as possible since data management is a complex subject, and the possibility of human error is something to avoid.
4. Choose appropriate technique
Also, note that the techniques used should be applicable to a specific use case or system. Sometimes it will make more sense to create the same pseudonym for everything,
ENISA recommendations and guidelines for pseudonymization
The guide discusses the criteria for choosing proper pseudonymization techniques, such as data protection, scalability, and recovery. Guide also reflects on specific use cases for different identifiers such as IP address or email address.
The report concluded that there is not just one solution or one way to operationalize pseudonymization that works for all industries or all scenarios.
“…there is no single easy solution to pseudonymisation that works for all approaches in all possible scenarios. On the contrary, it requires a high level of competence in order to apply a robust pseudonymisation process, possibly reducing the threat of discrimination or re-identification attacks, while maintaining the degree of utility necessary for the processing of pseudonymised data.”
Is the cost greater than the benefits?
Privacy and IT professionals who understand this topic can understand the benefits of data pseudonymization and data anonymization. However, the idea should penetrate a broader audience.
Business owners, non-profit organizations, SMEs, or big enterprises are all subject to the GDPR. Therefore all responsible for the protection of personal data and exposed to potential fines and reputational damages.
It is important to understand that the benefits of implementing such techniques will overcome the implementation costs and contribute to the education of everyone handling and processing personal data.