The EU General Data Protection Regulation has landed. Will you be ready?
Upcoming legislation looks set to change the way in which data can be used in testing
I’ve written on several occasions about the new EU General Data Protection Regulation, and the implications it might have for organizations within and outside the EU. The text of the Regulation has now been finalized, and is set for final approval in mid-2016, with a 2 year implementation period.
The news headlines have typically focused on things such as the potentially massive fines non-compliance can incur – up to 4% of an organization’s annual turn-over, or 20 million euros depending on what’s higher – as well as the implications for transferring data across borders, and how this might affect Cloud technology and current tech giants.
I’m more interested in the implication the GDPR will have for testing and development, and especially its possible repercussions for Test Data Management best practice. Some of these are set out below.
It has been de rigor to use masked production data in test and development environments. I’ve written before as to why masking data does not guarantee security, but the GDPR presents some further reasons as to why organizations might re-consider their test data management strategy, especially if the required “Data Protection by Design” principle is to be applied to testing.
Many test teams we’ve worked with, for example, are unsure exactly where data exists in test databases. Data models are frequently highly complex and poorly understood, with personal information stored inconsistently across test environments and disparate, inter-related test databases – a credit card might, for example, be found in a “notes” column.
What’s more, numerous audits we’ve performed using CA Test Data Manager’s data discovery capabilities have found personal information in masked databases we’ve been assured are “clean”. Yet, if exercised, the Right to Data Portability and the Right to Erasure prescribed by the GDPR will require that an EU Citizen’s data stored by an organization is either provided to them, or is deleted.
With the further requirement to keep data only for as long as is needed to fulfil the explicit reasons for which it was required, and to expose it only to as many people as are needed to fulfil this purpose (the so-called purpose limitation), the need to understand where data is and who is using it becomes imperative. The practice of testers copying and sharing masked data ad hoc, and keeping it on their machines indefinitely, is just not viable and increases significantly the risk of running afoul of the GDPR.
Considering also the complexity of masking data while retaining the referential integrity needed for testing, and the use of production data in non-production environments appears to no longer be the safe or simple solution. So, what technological and procedural changes might an organization make?
First, organizations need to fully understand what data they have, where it exists, and how it relates. Automated data profiling can be applied, using statistical analysis and mathematical filters to identify potentially personal information across inter-related databases and legacy components.
A data model can be built off the back of this profile, using cubed views to create a multi-dimensional picture of the relationships which exist in production data. At this point, data could be masked effectively for testing, but it is just as easy to generate synthetic data from scratch.
CA Test Data Manager, for example, offers power synthetic data generation, capable of creating synthetic data with all the characteristics of production, but none of the sensitive content. This can be fed into multiple test environments at once, while offering the substantial benefit for testers that data can be created so that it covers 100% of possible tests.
From a compliance perspective, this data can then be stored alongside existing, masked production data in a central Test Data Warehouse, from where re-usable data sets can be provisioned to authorized individuals on demand, on the basis of what task they are performing. Access can therefore be granted only if data will be used for a reason for which consent has been given, rather than granting it solely on a role basis.
To learn about additional benefits of adopting a complete, end-to-end Test Data Management strategy, please download our white paper, Moving Beyond Masking and Subsetting: Realizing the Value of Test Data Management.