Three reasons why data masking can’t completely protect your data

Far-reaching GDPR legislation will impact every company across the globe that maintains or processes EU citizens’ data.

In April, the EU signed into law the General Data Protection Regulation (GDPR) that will significantly change how companies use personal data in their testing and other development environments. What does this mean for developers and testers who rely on production data for testing their applications?

Many companies utilize some form of masking when handling production data for testing purposes.  This process is essentially making a copy of the production data, masking (replacing personal data with dummy data), sub-setting data into usable chunks and then saving it off for use in testing applications.

While masking is a useful component of test data management, it’s not enough. Given the new standards of data privacy set by GDPR, we need to look at innovative ways to generate data that will exceed compliance regulations, provide greater security and help build a truly agile test environment.

Here are three key reasons why your data protection should not be left to masking alone:

Reason #1: Masking does not eliminate all sensitive data

Effectively masking production data so that it is fit for purpose in test and development environments is prohibitively complicated. While the sensitive content of the data must be completely removed, the inter-column relationships must remain intact.

Due to the complexity of consistently masking every aspect of data, most masking focuses on the content of the data, and certain intra-and inter-system relationships to ensure that referential integrity is maintained. However, masked data can be cracked with a single piece of external information. One often-quoted study found that 87 percent of Americans could be identified by three unique identifiers: their date of birth, gender and zip code.

Reason #2: Masked data still says a lot

Even if the data has been masked in its entirety, commercially sensitive information can still be deduced in the form of functional requirements. The inter-column relationships of production data necessarily reflects how a system operates. As these are maintained in the masked data, a skilled eye could learn much about a system from the type of data that can pass through it, effectively revealing how the systems central to an organization’s operations function, and any weaknesses that might be exploited.

In other words, properly “anonymizing” data such that nobody can be identified from it requires that data to be masked at a highly granular level, and that remaining content cannot be pieced together from multiple “masked” databases.

Reason #3: The real danger is human error

A large percentage of data breaches are caused by insiders, and masking production data cannot alleviate this threat. And as long as production data leaves production environments in any form, organizations risk substantial fines. With the new GDPR regulation upon us, these fines alone can be as high as 4 percent of annual revenues if a company is found in violation.

Test data management reconsidered

As long as test teams are reliant on production data alone, they will be unable to run the tests needed to deliver valuable software that reflects changing business demands. Their projects will constantly be stalled by a lack of adequate data, and they will find themselves deciding between waiting for ‘fit for purpose’ data to become available, and not running the tests required to detect potentially critical defects.

Only a complete, end-to-end approach to Test Data Management, driven by requirements, allows organizations to remove testing bottlenecks, mitigate risk and minimize defect creation. As a necessary part of this, organizations should consider the value synthetic data generation might bring, as well as how they store, manage and provision data.

Check out the recent “Agile Test Data Management: The New Must-Have” paper by Forrester Research to learn more about how to test “agiley” using the right data with the right test data management strategy.

Jeff Scheaffer is general manager of Continuous Delivery Business Unit at CA Technologies. In this…


Modern Software Factory Hub

Your source for the tips, tools and insights to power your digital transformation.
Read more >
Low-Code Development: The Latest Killer Tool in the Agile Toolkit?What Are “Irresistible” APIs and Why Does Akamai's Kirsten Hunter Love Them?Persado's Assaf Baciu Is Engineering AI to Understand How You Feel