How Regulations Will Impact AI Innovation

The EU's General Data Protection Regulation will shape the way machine learning is implemented, on a global scale.

Sometimes, the tide of technological innovation seems unstoppable. But tech companies still have to abide by laws, rules and regulations, like the rest of us. Moreover, governments and other regulatory bodies are increasingly concerned with ensuring that basic rights and liberties don’t get washed away as the digital future rushes in. The General Data Protection Regulation (GDPR) provides a great example of how regulations will shape the development of powerful new technologies like big data analytics, machine learning and artificial intelligence.

Regulating Data Ownership

GDPR is a new European Union regulation that will come into force in late May 2018. It will impact tech innovators all over the world because it is concerned not with the behavior of European organizations but with the treatment of European citizens. These organizations will be forced to review their data processes and will face enormous fines (up to 4% of global turnover) if they fail to comply.

The implementation of GDPR will raise the bar for data protection by (at the very least) putting a greater emphasis on the concept of “data ownership” and regulating automated decisions based on personal information. This is a matter of concern for companies that specialize in data-hungry processes, especially processes based on machine learning. And some fear it may slow down the adoption of AI in the corporate market.

Machine Learning with Purpose

Are these fears justified? Understanding the details of GDPR and what they will mean in practice is tricky if you’re coming from a technical—rather than legal—perspective. Furthermore, how the regulation will be interpreted in practice is still unclear. But looking at the broad sweep of GDPR from an informed technical perspective, it is possible to make some estimates of how adoption of AI and machine learning may be impacted.

From this perspective, the most significant aspect of GDPR may be that it will require any company that collects a user’s personal data to have a clear purpose for doing so and clearly communicate that purpose to the user. This marks a paradigm shift from opt-out (“I have to notify an organization if I don’t want it to use the data it collects from me”) to opt-in (“I may agree to share my data in return for a tangible benefit”).

This sounds like great news for users, at least from a privacy perspective. But it may affect their ability to access the kind of tangible benefits that the opt-out paradigm has made possible. Up to this point, tech companies have been able to proactively address user needs by collecting their data, using this information to test hypotheses and putting machine learning features into production whenever the results are positive.

Does this approach violate the opt-in paradigm and the principle of data collection with purpose? GDPR allows the organization to use data for research purposes, even if the end-user is not notified. But whenever this research produces an outcome that will actually be used, the organization must notify the user and start re-collecting data from scratch. And it might well have to retrain its machine learning models with the new data.

Explaining the Benefits of AI

As previously noted, GDPR is not only focused on the purpose an organization has when it collects a user’s data but is also concerned with that organization’s ability to explain the decisions it makes based on that data. The most common interpretation of the regulation identifies an obligation to provide a human-understandable explanation of any automated decision that has a significant impact on the user.

This a rare example of a regulation predicting an emerging threat. GDPR was approved in 2016 but it wasn’t until the following year that thought leaders started to seriously discuss the negative impact that decisions based machine learning might have, especially if those decisions were based on learned biases and prejudices. But some now fear that the requirement for explicability could make some machine learning-based functionality effectively impractical.

A softer interpretation suggests that it will be permissible to let the user give an application permission to make automated decisions on their behalf. Still, the request for this permission would have to be accompanied by a comprehensible explanation of how the app makes decisions and how these decisions may impact that user. This could be a problem for companies with AIs so advanced that nobody fully understands how they make decisions.

With limited legal knowledge, it is hard to make a definitive statement about how all this will play out in practice. From a technical perspective, the level of granularity GDPR requires in explaining automated decisions is unclear. Until the picture becomes clearer, some innovators may choose to forge ahead with top-notch algorithms. Others, it is feared, may ban European citizens from using some highly-valuable functionality.

The Data-Centric Software Factory

For those organizations that move ahead with application features based on machine learning and AI, the regulation will introduce a new level of complexity into an already complicated situation. Requirements such as the right to be forgotten and right to rectification can only be met with a renewed focus on how data is dealt with in terms of modifying historical beliefs, re-executing of learning processes, rectifying data and more.

These organizations will need infrastructure that allows them to the manage the whole data lifecycle, which begs the question: Is your company ready for the data-centric software factory?

David Sanchez
By David Sanchez | February 20, 2018