Do you have permission to use that data?
The recently finalized EU General Data Protection Regulation (GDPR) is set to underline the need for explicit consent when using EU citizens’ data. It will carry ramifications for the majority of organizations we’ve worked with who still use production data in test and development environments – including maximum fines of 4% of annual turnover.
That consent is needed to use personal data is not new, and many organizations already have privacy policies regarding the disclosure of information, as well as opt-in policies, box-checking mechanisms, and internal data privacy policies in place. So, what difference will the GDPR make?
Firstly, the GDPR sets to tighten how consent is defined. In general terms, the burden of proof is placed on the data controller, and the final draft of the GDPR has struck a middle position between the demand for explicit consent and unambiguous consent. Consent must be constituted by an affirmation action, while the GDPR will rule out the possibility of “Opt Out” consent or consent constituted by silence for example.
In other words, there is a greater burden placed on data controllers that personal data is used only for the reason for which it was provided, unless its use is necessary for legal purposes, public interest, or to fulfil a legitimate interest of the processor. There is nothing substantially new here.
This is where the regulation gets interesting for testing and development. The need for data minimization and purpose limitation is strengthened by the Regulation. Data cannot be stored longer than the minimum necessary to fulfil the reason for which it was given, and cannot be made accessible to an indefinite number of individuals either.
What’s more, EU citizens retain the right to withdraw their consent “at any time”, while there is also the much debated right to data portability. This sets out that an EU citizen can request a copy of their data, in a format usable by them. They can go further, too, exercising the right to erasure, whereupon an organization must delete their data “without delay”.
The challenge is that many organizations we’ve worked with don’t fully understand what data they have, let alone who uses it and for what purposes. Data models are usually poorly understood, while commonly used role-based data access restrictions cannot limit the use of data once provisioned.
With personal information stored inconsistently, we’ve often found personal data in masked, non-production databases which we’ve been assured are clean. Column-based data discovery, for example, might fail to find data stored in a “notes” or “other” column.
If an EU citizen withdraws their consent or requests that their data is deleted, organizations lacking automated data discovery across databases and environments might be forced to manually search through their data. This is slow and error-prone, and when you consider the ad hoc way in which testers copy and store data, it becomes highly unlikely that data will be removed “without delay”.
Underlying the real threat of non-compliance, 46% of 500 global IT professionals said that they have received customer requests to remove data in the last 12 months, and yet 41% admitted that they do not have definite processes, technology or documentation to remove the data.
Masking is commonly used to avoid the inappropriate use of personal data in non-production environments, but even this carries risks.
Masking data for testing is highly complex, and often some original information and relationships are retained in order to maintain referential integrity. As mentioned, sensitive information is often also found residing in ‘masked’ data, and there might then be instances where masked data would need to be deleted, by virtue of its resemblance to production.
However, knowing how masked data maps to the original can be highly complex, and such knowledge itself carries an internal security risk. Throw in the danger of reverse-engineering masked data, or of combining data attributes to reveal personal information, and masking does not remove the risk of personal information being used for reasons other than which it was given.
The only way to avoid providing testers with personal data is to not provide them with production data at all. A complete test data management strategy must therefore extend beyond data masking, enabling a full understanding of existing data and robust data access restriction, while allowing realistic but fictitious test data to be created.