When General Data Protection Regulation (GDPR) first came into full enforcement, a new breed of privacy professionals came into existence- Data Protection Officers (DPO)
Not that there weren’t any privacy professionals before, but GDPR made privacy professionals, who used to focus mainly on understanding laws and regulations, face another challenge- understanding how IT systems work to know where personal data is stored.
In order to achieve compliance and advance your privacy program, you will need to conduct data discovery and understand what kind of data you are processing, where that data is stored, and how you are using it.
What is data discovery?
Data discovery is the process of discovering, cataloging, and classifying information in different IT systems. It aims to understand what kind of data is being stored and processed so the business can derive value from it.
When speaking in terms of personal data processing, data discovery is fundamental to achieving compliance.
The process of personal data discovery detects all personal data you hold and discovers data processing activities your organization conducts, allowing you to set up appropriate technical and organizational measures and manage personal data in a compliant way.
Why is data discovery important for compliance?
Over time, most organizations have collected and stored personal data across their systems for different purposes that might no longer exist.
That data might be unused or hidden, and organizations might not even be aware of it. Nonetheless, the same GDPR principles and rules are also applied to that data.
Undetected personal data can not be properly managed or protected. As a result, it can be subject to data breaches and ultimately represents a data protection risk.
Additionally, companies have been accumulating more personal data than ever.
In a way, most companies can be considered overweight, with extra proverbial pounds of accumulated personal data. The goal of data discovery is to get your company in shape.
In order to do so, you need to target the areas of access to personal data and types of personal data the company is collecting and minimize them. This not only reduces privacy risks but also results in an overall better quality of data.
Building data processing inventory
The data discovery process is necessary in order to build your data processing inventory.
Data processing inventory is a repository of all data processing activities within your organization and the beginning of the compliance process because it all starts with understanding what type of personal data processing you do.
The best advice to anyone entrusted with this task is to pretend to be tabula rasa when it comes to an understanding of what data organization is processing and why.
Otherwise, you can overlook different types of shadow processing and be overconfident with what you think you know.
The employees working with the data are the only ones who truly know the details needed to maintain compliant records of processing activities, so it is crucial to cooperate with different organizational units.
Discovering processing by conducting surveys
In order to discover as many details as possible about the data processing, the discovery process should involve as many employees included in the processing as possible.
In order to do so, you will need to create and distribute privacy impact assessment (PIA) surveys that should include the following groups of questions:
✅What is the purpose of personal data processing?
As a minimum, you need to identify the business purpose behind the processing and define the lawful basis for the processing.
✅How do you process the data?
Understand how the organization collects personal data and how it flows through the organization. Furthermore, understand where the data is stored, how it is distributed, and who has access to the data. Find out if there is any type of automated decision-making, including profiling, sharing of data, etc.
✅ How long do you keep the data?
Understand the entire lifecycle of personal data and define data retention periods describing how long the organization keeps the data and when it is removed from the organization’s archives.
✅ How is the data protected?
What is the organization doing to protect the data at rest and data in motion? Are any technical or organizational measures applied to the processing activity and/or systems processing the data?
✅ Who else is processing the data?
Who are the third parties involved in the processing? Who are you disclosing the data to? Make sure to work only with approved partners and vendors.
✅ What are the responsibilities linked to the processing?
Identify the stakeholders and understand the responsibilities within the organization.
✅ Are there any risks?
Assess security or privacy risks and define a mitigation strategy.
When the surveys are distributed, relevant information will populate the data processing inventory. This should create a healthy ground for further automation of privacy initiatives and provide the stakeholders with factual information.
To ensure that the information is credible, assign the responsibilities to the right people. However, there is always a high possibility of human error that should be taken into account.
Automated data discovery
An alternative to a traditional approach that uses employee and vendor surveys to collect information about data types and data processing is automated data discovery and classification.
The traditional way of collecting information can often be slow and unreliable, as there are no guarantees that provided information is correct.
Information system owners may not have complete or up-to-date information about which types of data are stored and used in the systems they manage.
They may rely on manual analysis or older traditional discovery tools with too many false-positive and false-negative discoveries.
Unlike the discovery of the processing done by collecting information from the staff, automated data discovery is more of a technical process.
When using automation, privacy teams do not have to trust anyone in the company to correctly provide information about the types of personal data they store and use.
That is a step further in building a robust governance framework for managing privacy risks.
The output of the data discovery process is a repository of:
- data domains; name, e-mail address, VAT number, racial or ethnic origin, blood type, family status, employment status
- data categories (contact information, employment information, medical information, etc.) and
- technical coordinates of the data (systems, databases, schemas, tables, columns, folders, documents, etc.) linked with the data processing inventory.
There is a possibility that by discovering the data, you will also identify processing that has not been accounted for and even processing done without a business purpose.
DPM Data Discovery
DPM Data Discovery is a privacy-centric data discovery solution that connects to all your data stores, scans the data regularly, and reports the discoveries.
- Language-agnostic and script-agnostic to cover all your markets no matter the language or the script in use
- Discovers personal data from structured and unstructured sources
- Connects to all standard databases
- No third parties, no personal data in the cloud
- Automatically searches for personal data
- Uncovers dark data and shadow processing
- Independent of privacy software in use