What is Pseudonymization?

Pseudonymization

Pseudonymization is a process that allows you to switch the original set of data (for example data subject’s e-mail) with an alias or pseudonym.

This process makes it possible for the organization to de-identify the data but allows the reidentification of the data later on.

This is a well-known data management technique that is highly recommended by the General Data Protection Regulation (GDPR) as one of the data protection methods.

Pseudonymization makes it easier for data processors to process personal information without the fear of exposing sensitive data to personnel and employees that should not have access to it.

For example, we have all been witness to excel sheets containing sensitive data being sent via e-mail. Although sender and receiver of the e-mails are authorized to have access to that information, your IT support also has access to those e-mails. Now imagine it was upper management bonuses or information about company wages.

A particular pseudonym for each replaced data value makes the data record unidentifiable while remaining suitable for data processing and data analysis.

What does the GDPR say about pseudonymization?

In Article 4(5) of the GDPR, the process of pseudonymization is defined as:

“the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”

If you are a DPO, surely you can see the benefits of pseudonymization. It makes data identifiable if needed, but inaccessible to unauthorized users. Pseudonymization allows data processors and data controllers to lower the risk of a data breach and live without the fear of infringing on the rights of data subjects.

GDPR requires you to take all appropriate measures and steps to protect personal data, and pseudonymization should be your first step into that direction. Pseudonymization is a method that allows businesses to protect data, separating the direct identifiers from the data, while the data utility remains the same.

However, the GDPR sees pseudonymized data as personal data still.  Why? It is explained in Recital 26:

“…data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person.”

So, as long as you can identify a person using other identifiers, it is considered to be personal data by the GDPR. Which seems fair from the data subjects’ point of view.

Why should you use pseudonymization?

In everyday operations of any business, a lot of sensitive data sets go through a lot of different hands (data from HR, marketing information…).

Since you are obligated to protect the personal data of your customers and employees it is recommended that you avoid any potential risks coming from a possible data breach, which may resolve in high GDPR fines. This is explained in Recital 28:

“The application of pseudonymization to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations.”

Pseudonymization use cases

Pseudonymization can be achieved using various methods like data masking, encryption or tokenization. It is commonly used as a technique to protect personal data on legacy production systems from unauthorized access where other security methods are inapplicable.

However, when implementing new IT systems, Organizations need to think about data protection by design and default. And systems with data architectures utilizing pseudonymization can be very effective.

Another common use case for pseudonymization on production systems that are processing personal data is to temporarily store original values when anonymizing personal data for a fail-back mechanism.

In this case, pseudonyms can be stored for a short period of time, enough for the Business to confirm anonymization has been completed successfully.

How does pseudonymization work?

This is a very simple visualization of how pseudonymization and anonymization work:

On the left, you can see the real data, while on the right you can see pseudonymized data. Although you can’t find out any information about pseudonymized data, you can still detect that some of those pseudonyms are repeating. That means they are regarding the same data subject. Anonymized data, on the other hand, makes it impossible to distinguish if this data relates to 4 different people or just one person.

You still need additional information connecting pseudonyms with the original information.

Authorized users are allowed to map pseudonyms and original information, thus giving access to personal data only to them.

Anonymization vs Pseudonymization

pseudonymization vs. anonymization

Anonymization is a technique that irreversibly alters data so the data subject is no longer identifiable directly or indirectly. This data is no longer considered personal data (as opposed to pseudonymization technique). Anonymization removes the risk of disclosure of personal data when transferred between third-parties or entities.

When it comes to anonymization and pseudonymization, it is very important to note that the GDPR makes notable differences between those two.

Both methods are highly recommended; the choice will depend on:
•the use case,
•degree of risk
•the way data is processed within your company

The best method for you will be determined by the type of data you process and the risk of a data breach it imposes.

However, as we mentioned before, the GDPR sees pseudonymized data as personal data, because the individual can be identified, while anonymized data is no longer considered personal data.

Recommendations for pseudonymization

  • Anonymize personal data on non-production systems

It is highly recommended to anonymize personal data on non-production environments, used for development, testing and training purposes. Data sets with anonymized personal information are still great for development, statistics, and analytics.

  • Use pseudonymization on production systems

When designing data protection for live production systems, it is recommended to use pseudonymization. By using pseudonymization, only authorized users will have access to data subjects’ personal data. Once the lawful basis for processing data subject’s personal data no longer exists, the system will delete the pseudonym and make the data subject anonymized (forgotten).

  • Automate pseudonymization and anonymization 

No matter the use case, both pseudonymization, and anonymization should be automated. So should data validations. Make sure to automate your processes as much as possible since data management is a complex subject, and the possibility of human error is something to avoid.

  • Use appropriate techniques

Also, note that the techniques used should be applicable to a specific use case or system. Sometimes it will make more sense to create the same pseudonym for everything, 

Example: you can always mask first names to “ANONYMIZED” making the record immediately recognizable as pseudonymized. Or you can choose to create a real name and change “Jane” into “Anna”. What you choose should make sense in your use case, environment, applicable data validations, etc.

EU recommendations and guidelines for pseudonymization

The European Union Agency for Cybersecurity (ENISA), published a new report on pseudonymization techniques and best practices, that deals with the technical solutions for pseudonymization that can be conducted in practice.

The ENISA issued a report provoked by the challenges of implementation of pseudonymization in practice.

The guide discusses the criteria for choosing proper pseudonymization techniques, such as data protection, scalability, and recovery. Guide also reflects on specific use cases for different identifiers such as IP address or email address.

The report concluded that there is not just one solution or one way to operationalize pseudonymization that works for all industries or all scenarios.

“…there is no single easy solution to pseudonymisation that works for all approaches in all possible scenarios. On the contrary, it requires a high level of competence in order to apply a robust pseudonymisation process, possibly reducing the threat of discrimination or re-identification attacks, while maintaining the degree of utility necessary for the processing of pseudonymised data.”

Conclusion

Privacy and IT professionals’ understanding this topic can off the bat understand the benefits and gains from implementing techniques like data pseudonymization and data anonymization. However, the idea should penetrate to a broader audience.

Business owners, non-profit organizations, SME or big enterprise companies are all subject to the General Data Protection Regulation (GDPR). Therefore all held responsible for the protection of personal data and exposed to the potential fines and reputational damages.

We hope that the benefits of implementing such techniques will overcome the implementation costs and contribute to the education of everyone handling and processing personal data.

FREE GDPR CONSULTATIONS