Why is Data Discovery important for compliance? [Infographic]

When privacy laws first came into full enforcement, the new breed of privacy professionals came into existence- the  DPO’s.

Not that there weren’t any privacy professionals before, but GDPR forced privacy professionals, who used to focus mostly on the understanding of the law and regulations, to face another challenge.

Understanding how IT systems, in which the data is stored, work.

Read the blog: 6 Steps to Explain GDPR to Your IT

As this is quite a comprehensive task, cooperation with other organizational units is of the utmost importance for successful compliance.

What is data discovery?

First things first, what is data discovery?

Data discovery is the process of discovering, cataloging, and classifying information in different systems.

The goal of data discovery is to understand what kind of data is being stored and processed so that the business can derive value from it.

When speaking in terms of compliance and personal data processing, data discovery is fundamental to understand and detect all data processing activities.

Data discovery process

The data discovery process is necessary in order to build your Data processing inventory.

Data processing inventory is a repository of all data processing activities within your company or organization and the beginning of the compliance process because it all starts with understanding what type of personal data processing you do.

The best advice to anyone entrusted with this task is to pretend to be tabula rasa when it comes to an understanding of what data is Organization processing and why.

Otherwise, you risk missing to record a lot of shadow processing and be overconfident with what you think you know.

The employees working with the data are the only ones who truly know the details needed to maintain the records of processing.

Discover the processing by conducting surveys

In order to discover as many details as possible about the data processing, the discovery process should involve as many employees included in the processing as possible.

The best way to do that is to create and distribute privacy impact assessment (PIA) surveys that should include the following groups of questions:

What is the purpose of personal data processing?

As a minimum, you need to identify the business purpose behind the processing and define the lawful basis for the processing.

How do you process the data?

Understand how the Organization collects personal data and how it flows through the Organization. Furthermore, understand where the data is stored, how it is distributed, and who has access to the data. Find out if automated decision-making exists, including profiling, sharing of data, etc.

How long do you keep the data?

Understand the full lifecycle of personal data and define data retention periods describing how long the Organization keeps the data and when it is removed from the Organizations archives.

How is the data protected?

What is the Organization doing to protect the data at rest and data in motion? Are there any technical or organizational measures applied to the processing activity and/or systems processing the data?

Who else is processing the data?

Who are the third-parties involved in the processing? Who are you disclosing the data to? Make sure to work only with approved partners and vendors.

✅ What are the responsibilities linked to the processing?

Identify the stakeholders and understand the responsibilities within the Organization.

✅ Are there any risks?

Assess security or privacy risks and define a mitigation strategy.


When the surveys have been distributed, relevant information will start populating the data processing inventory. This will create a healthy ground for further automation of the Privacy initiatives and provide the stakeholders with factual information.

To make sure that the information is credible assign the responsibilities to the right people.

However, there is always the possibility of an error since the information is collected from people.

What more can you do to make sure every little bit of personal data processing is recorded?

Get 14-days Free Data Privacy Manager Trial

Discover the data

As we mentioned before, the data discovery is an iterative process of finding and classifying the data Organization is working with.

Unlike the discovery of the processing done by collecting information from the staff, this is more of a technical process demanding technical knowledge about IT systems and storage.

The output of the data discovery process is a repository of data domains (e.g. name, e-mail address, VAT number, racial or ethnic origin, blood type, family status, employment status, etc.), data categories (contact information, employment information, medical information, etc.) and technical coordinates of the data (systems, databases, schemas, tables, columns, folders, documents, etc.), and should be linked with data processing inventory.

There is a possibility that by discovering the data, you will also identify processing that has not been accounted for, and even processing done without a justifiable purpose.

For complex IT architectures, this is a tedious task if done manually, and for that reason, it is recommended to use data discovery software.

Some Data Privacy software provides the tools to automate the data discovery process and link the results with data processing inventory.

Get your free Data Privacy Manager trial

Try Data Privacy Manager and experience how you can simplify managing records of processing activities, third-parties, or data subject requests!

Scroll to Top