What is Pseudonymization?
Pseudonymization is a process that allows you to switch the original set of data (for example data subject’s e-mail) with alias or pseudonym.
This process makes it possible for the organization to de-identify the data but allows the reidentification of the data later on.
Pseudonymization makes it easier for data processors to process personal data without the fear of exposing sensitive data to personnel that should not have access to it.
For example, we have all been witness to excel sheets containing sensitive data being sent via e-mail. Although sender and receiver of the e-mail are authorized to have access to that information, IT support of the company also has access to those e-mails which makes it a data breach. Now imagine it was management bonuses or information about company wages.
A particular pseudonym for each replaced data value makes the data record unidentifiable while remaining suitable for data processing and data analysis.
Pseudonymization use cases
Pseudonymization can be achieved using various methods like data masking, encryption or tokenization. Pseudonymization is commonly used as a technique to protect personal data on legacy production systems from unauthorized access where other security methods are inapplicable.
However, when implementing new IT systems, Organizations need to think about data protection by design and default. And systems with data architectures utilizing pseudonymization can be very effective.
Another common use case for pseudonymization on production systems processing personal data, is to temporarily store original values when anonymizing personal data for the purpose of providing a fail-back mechanism.
In this case, pseudonyms can be stored for a short period of time, enough for the Business to confirm anonymization has been completed successfully.
What does GDPR say about pseudonymization?
In Article 4(5) of the GDPR, the process of pseudonymization is defined as:
“the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information.”
If you are a DPO, surely you can see the benefits of pseudonymization. It makes data identifiable if needed, but inaccessible to unauthorized users. Pseudonymization allows data processors and data controllers to lower the risk of a data breach and live without the fear of infringing on the rights of data subjects.
How does pseudonymization work?
This is a very simple visualization of how pseudonymization and anonymization work:
On the left, you can see the real data, while on the right you can see pseudonymized data. Although you can’t find out any information on pseudonymized data, you can still detect that some of those pseudonyms are repeating. That means they are regarding the same data subject. You still need additional information connecting pseudonyms with the original information.
Authorized users are allowed to map pseudonyms and original information thus giving access to personal data only to them:
Anonymized data, on the other hand, makes it impossible to distinguish if this data relates to 4 different people or just one person.
Anonymization vs. Pseudonymization
When it comes to anonymization and pseudonymization it is very important to note that GDPR makes notable differences between those two.
Both methods are highly recommended. The choice will depend on:
- the use case,
- degree of risk
- the way data is processed within your company
The best method for you will be determined by the type of data you process and the risk of a data breach it imposes.
- Anonymize personal data on non-production systems
- It is highly recommended to anonymize personal data on non-production environments, used for development, testing and training purposes. Data sets with anonymized personal information are still great for development, statistics, and analytics.
- Use pseudonymization on production systems
- When designing data protection for live production systems it is recommended to use pseudonymization. By using pseudonymization only authorized users will have access to data subjects’ personal data. And once the lawful basis for processing data subject’s personal data no longer exists the system will just delete the pseudonym and make the data subject anonymized (forgotten).
- Automate everything
- No matter the use case both pseudonymization and anonymization should be automated. So should data validations. Make sure to automate as much as possible due to the fact that data management is complex.
- Use appropriate techniques
- Also, note that the techniques used should be applicable to a specific use case or system. Sometimes it will make more sense to create the same pseudonym for everything,
Example: you can always mask first names to “ANONYMIZED” making the record immediately recognizable as pseudonymized. Or you can choose to create a real name and change “Jane” into “Anna”. What you choose should make sense in your use case, environment, applicable data validations, etc.
Research shows that by 2022, half of our planet’s population will have its personal information covered under local privacy regulations in line with the GDPR. Privacy and IT professionals understanding of this topic is becoming a priority as it is changing data architectures worldwide.
Check us out on Data Privacy Manager and let us know if you would like a demo of an automated Data Removal solution.