Companies today operate in an environment where they need to quickly grasp the sheer volume of data and the importance of the data they are collecting and processing.
With an increasing number of real-life use cases for new groundbreaking technologies, like AI, Fintech, DNA sequencing, and autonomous vehicles, everything is digital and generates a lot of data.
The fact is, you cannot create a competitive digital product without having enough data to start with. Furthermore, unused, unmanaged, and unprotected data can create additional costs and financial, legal, and privacy-related risks.
Before you start with data discovery…
Any effort to advance your privacy program must start with the understanding of what your company is doing with personal data, and to do so you will need to chart a data map.
When given the task to map all personal data processed by your company, you could expect to encounter large volumes of personal data in places you wouldn’t expect them to be.
Data might be ancient and forgotten or heavily used but not adequately protected. Whatever the case might be, you need to discover as much as you can and chart a data map of your company.
The more information you have about the data, the better you can protect it. Keep in mind that not a single person in the company knows where all personal data is processed.
The goal of the privacy program is to get your company in shape. In order to do so, you need to target the areas of access to personal data and types of personal data the company is collecting and minimize them.
Getting the company in shape not only reduces privacy risks but also results in better use and quality of data and improved business processes.
The first step in this process must evidently be discovering all personal data.
Starting with the Data Discovery
When starting with data discovery, you will most likely begin with the usual suspects like Marketing, Sales, or Human Resources.
You will want to look at the applications they use and examine how their processes work. This might already be well described in the Records of
processing activities.
If you are a B2C company, you will probably have more personal data within your core business systems. And if your company makes strategic decisions based on data, you will have even more in your data lake or data warehouse databases.
For you, the fun can start once you have the complete and very inclusive list of Systems.
Open yourself to the experience of learning about different types of data structures, databases, applications, and data archives.
Choosing the right approach
Choosing the right approach to data discovery can be critical, so carefully consider different options.
Decentralized approach
With a decentralized approach, you will delegate discovery to different owners.
Different owners will be in charge of managing the inventory of personal data under their jurisdiction, and you will always have up-to-date information about the data types they process.
This is a very elegant approach to data discovery, especially if you have clearly defined data ownership.
The owners can be in-house for systems managed by your company or third parties for systems managed externally.
Centralized approach
Another approach would be for your privacy team to conduct data discovery independently.
For this to work, you will have to add technical personnel to your privacy team, like Data protection engineers.
Whichever approach you choose, you will want to make sure that:
- You can trust the information you receive, and
- All personal data is discovered
You will also want to get notified when new types of personal data are discovered and new systems are introduced to the company’s IT landscape.
If you have ever tried to map personal data within the company, you are experienced enough to know that discovery and classification can get very technical very fast.
With the increasing number of applications, data, and the complexity of the data architecture, the automation of data discovery can prove to be the only reliable option.
Besides, we live in a time of robotization and the accelerated application of AI. It would not make a lot of sense from a business perspective to have people swipe through large volumes of data and manually label personal data types.
Opt for a privacy-centric solution
Data discovery software is designed to automate data discovery and classification, which allows companies to continuously monitor the sensitive data they process.
Forrester defines sensitive data discovery and classification as “The capability to provide visibility into where sensitive data is located, identify what this sensitive data is and why it is considered sensitive, and tag or label this data based on its level of sensitivity.”
A privacy-centric data discovery software specializes in discovering and classifying personal data, thus providing privacy data intelligence, as well as some remediation capabilities.
When using automation, Privacy teams do not have to trust anyone in the company to correctly provide information about the types of personal data they store and use.
That is a step further in building a robust governance framework for managing privacy risks.
A traditional approach to building personal data mapping uses employee and vendor surveys to collect information about data types and where data is processed.
This kind of information flow can often be slow and unreliable, as there are no guarantees that provided information is correct.
Information system owners may not have complete or up-to-date information about which types of data are stored and used in the systems they manage.
They may rely on manual analysis or older traditional discovery tools with too many false-positive and false-negative discoveries.
Also, they can have high-priority projects running and not enough resources to provide all needed information to the Privacy team.
DPM Data Discovery
We have experienced a multitude of problems when mapping personal data for our clients.
Even when using existing data discovery solutions, we could never fully trust the information they provided. The existing software could not scan both structured and unstructured data or could not label personal data in other languages or scripts.
That would result in an incomplete and erroneous data inventory, which proved to be detrimental to the success of any privacy program.
That is why we have invested in building a data discovery software we can trust. Our classification engine results from multi-year research and experimentation with available AI technologies.
We focused on the precise classification of personal data using bleeding-edge technologies that work for all types of textual data. DPM Data Discovery can connect to all your data stores, scan the data regularly and report the discoveries.
Furthermore, it combines the information from the Data Privacy Manager platform with the data discovery results.
In this way, it provides a trusted source of information and answers questions such as: What special category data is not encrypted? How many occurrences of a credit card number do we have in System X? What personal data is hosted in country Y?
Using this information, the privacy team can identify privacy risks, get a deep insight into the data flows of the company, and often uncover both dark data and shadow processing.
What makes DPM Data Discovery unique?
- DPM Data Discovery connects to all standard databases, file share locations, SaaS applications, and other types of data sources
- Works with all file types like text, Excel sheets, pdf, CVS, e-mails, log files, social network interactions, and others
- Labels personal data in any language and any script
- Works with structured and unstructured data