Pseudonymization according to the GDPR [definitions and examples]

What is Pseudonymization?

Pseudonymization is a process that allows you to switch the original data set (for example data subject’s e-mail or a name) with an alias or pseudonym. Pseudonymization is a reversible process, that de-identifies data but allows the reidentification later on if necessary.

This is a well-known data management technique that is highly recommended by the General Data Protection Regulation (GDPR) as one of the data protection methods.

Anonymization and pseudonymization are not the same methods. The main difference is that pseudonymization is a reversible process, unlike anonymization.

Pseudonymization makes it easier for data processors to process personal information without the fear of exposing sensitive data to personnel and employees that should not have access to it.

For example, we have all been witness to excel sheets containing sensitive data being sent via e-mail. Although sender and receiver of the e-mails are authorized to have access to that information, your IT support also has access to those e-mails. Now imagine it was upper management bonuses or information about company wages.

When the data is pseudonymized there is a lot less chance of exposing personal data, since a particular pseudonym makes the data record unidentifiable while remaining suitable for data processing and data analysis.

What is pseudonym?

In this context, a pseudonym is an identifier. It is personal data that is associated with an individual.

A pseudonym is still considered to be personal data according to the GDPR, since the process is reversible, and with a proper key you can identify the individual.

Is anonymized data still considered personal data?

GDPR only concerns with the processing of personal data that relates to a natural person that allows identification of an individual directly or indirectly via that information.

If the data is anonymized so the data subject is no longer identifiable (directly or indirectly), the GDPR simply doesn’t see it as personal data anymore. However, anonymizing data can often destroy the value that data holds for your organization.

Is pseudonymized data still personal data?

When compared to anonymization, pseudonymization is a more sophisticated option since it leaves you the key to “unlock” the data. This way data is not considered directly identifying and it is not anonymized either. Pseudonymized allows you to indirectly identify the individual.

In Article 4(5) of the GDPR, the process of pseudonymization is defined as:

“the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”

If you are a DPO, you can see the appeal and benefits of pseudonymization. It makes data identifiable if needed, but inaccessible to unauthorized users. Pseudonymization allows data processors and data controllers to lower the risk of a potential data breach and safeguard personal data.

GDPR requires you to take all appropriate measures and steps to protect personal data, and although by itself pseudonymization is not sufficient method, it allows businesses to protect data, separating the direct identifiers from the data, while the data utility remains the same.

However, the GDPR sees pseudonymized data as personal data still. Why? It is explained in Recital 26:

“…data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person.”

So, as long as you can identify a person using other identifiers, it is considered to be personal data by the GDPR. Which seems fair from the data subjects’ point of view. However, if you use this technique GDPR requirements get much easier to tackle.

Anonymization vs. Pseudonymization

First of all, it is important to note that pseudonymization and anonymization are not the same. The example below depicts the end result of data pseudonymization and anonymization.

You can see the difference at first glance. Anonymized data does not tell you anything, while pseudonymized data is masked it still has a specific identifier that lets you access the data. If you are authorized to access that information, you will have the key that will enable you to de-identify the data.

pseudonymization vs. anonymization

Anonymization is a technique that irreversibly alters data so the data subject is no longer identifiable directly or indirectly. This data is no longer considered personal data (as opposed to pseudonymization technique).

When it comes to anonymization and pseudonymization, it is very important to note that the GDPR makes notable differences between those two.

Both methods are highly recommended the choice will depend on many factors (the use case, degree of risk, the way data is processed within your company…).

The best method for you will be determined by the type of data you process and the risk of a data breach it imposes.

How does pseudonymization work?

This is a very simple visualization of how pseudonymization and anonymization work:

On the left, you can see the original data, while on the right you can see pseudonymized data. Although you can’t find out any information from pseudonymized data, you can still detect that some of those pseudonyms are repeating.

That means they are regarding to the same data subject, you will still need additional information connecting pseudonyms with the original data set to be able to read it. Anonymized data, on the other hand, makes it impossible to distinguish if this data relates to two, three, or four people.

Authorized users are allowed to map pseudonyms and original information, thus giving access to personal data only to them.

Why should you use pseudonymization?

In everyday operations of any business, a lot of sensitive data sets go through a lot of different hands (data from HR, marketing information, employees’ sensitive data…), and pseudonymization can help you lower the risk and avoid any possible data breach, which may resolve in high GDPR fines. This is explained in Recital 28:

“The application of pseudonymization to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations.”

Get 14-days Free Data Privacy Manager Trial

There are also some direct benefits of pseudonymization that are allowed by the GDPR. For example, there is an exception to the GDPR requirement that allows you to collect data only for specific, explicit, and legitimate purposes (Article 5). However, only you can rely on the exception to the purpose limitation principle only if the data is further processed in a way that is compatible with the initial purposes for collection.

If you rely on this exception make sure there is a distinguishable link between the processing activities, the nature of data, and so on.

Pseudonymization use cases

Pseudonymization can be achieved using various methods like data masking, encryption or tokenization. It is commonly used as a technique to protect personal data on legacy production systems from unauthorized access where other security methods are inapplicable.

However, when implementing new IT systems, Organizations need to think about data protection by design and default. And systems with data architectures utilizing pseudonymization can be very effective.

Another common use case for pseudonymization on production systems that are processing personal data is to temporarily store original values when anonymizing personal data for a fail-back mechanism.

In this case, pseudonyms can be stored for a short period of time, enough for the Business to confirm anonymization has been completed successfully.

Recommendations for pseudonymization

  • Anonymize personal data on non-production systems

It is highly recommended to anonymize personal data on non-production environments, used for development, testing and training purposes. Data sets with anonymized personal information are still great for development, statistics, and analytics.

  • Use pseudonymization on production systems

When designing data protection for live production systems, it is recommended to use pseudonymization. By using pseudonymization, only authorized users will have access to data subjects’ personal data. Once the lawful basis for processing data subject’s personal data no longer exists, the system will delete the pseudonym and make the data subject anonymized (forgotten).

  • Automate pseudonymization and anonymization 

No matter the use case, both pseudonymization, and anonymization should be automated. So should data validations. Make sure to automate your processes as much as possible since data management is a complex subject, and the possibility of human error is something to avoid.

  • Use appropriate techniques

Also, note that the techniques used should be applicable to a specific use case or system. Sometimes it will make more sense to create the same pseudonym for everything, 

EU recommendations and guidelines for pseudonymization

The European Union Agency for Cybersecurity (ENISA), published a new report on pseudonymization techniques and best practices, that deals with the technical solutions for pseudonymization that can be conducted in practice.

The ENISA issued a report provoked by the challenges of implementation of pseudonymization in practice.

The guide discusses the criteria for choosing proper pseudonymization techniques, such as data protection, scalability, and recovery. Guide also reflects on specific use cases for different identifiers such as IP address or email address.

The report concluded that there is not just one solution or one way to operationalize pseudonymization that works for all industries or all scenarios.

“…there is no single easy solution to pseudonymisation that works for all approaches in all possible scenarios. On the contrary, it requires a high level of competence in order to apply a robust pseudonymisation process, possibly reducing the threat of discrimination or re-identification attacks, while maintaining the degree of utility necessary for the processing of pseudonymised data.”

Is the cost of pseudonymization greater than the benefits?

Privacy and IT professionals’ who understand this topic can off the bat spot benefits from data pseudonymization and data anonymization. However, the idea should penetrate to a broader audience.

Business owners, non-profit organizations, SME, or big enterprise companies are all subject to the General Data Protection Regulation (GDPR). Therefore all held responsible for the protection of personal data and exposed to the potential fines and reputational damages.

It is important to understand that the benefits of implementing such techniques will overcome the implementation costs and contribute to the education of everyone handling and processing personal data.