Privacy Preserving Multiparty Analytics
Privacy and security are often discussed together, but they are not the same. Cambridge Analytica was able to access private information about millions of Facebook users, without breaking into Facebook. There was no security breach; it was purely a privacy invasion. While Cambridge Analytica misused the data they downloaded, in the right circumstances, sharing data can bring huge benefits to both businesses and societies.
For example, cancer researchers around the world currently send all of their data to an arm’s length, third party in order to preserve study participants’ privacy. Such research could potentially be more productive if scientists were able to collaborate directly with other researchers around the aggregated data, but this would pose a privacy risk. Our Privacy-Preserving Multi-party Analytics research project is focused on data privacy and has the potential to solve this challenge. It asks how we can both preserve privacy while enabling organizations to create strong analytical models that benefit from shared data.
The benefits of this research go beyond the medical realm. For example, organizations can build stronger fraud detection, intrusion detection and document classification models by combining their data with data from other organizations. Current concerns center on sharing proprietary information with competitors or third-parties. However, Privacy-Preserving Multi-Party Analytics would allow them to share their data while still maintaining ownership and control of it – without compromising privacy.
The classical method of preserving personal identity–anonymization – is not a sufficient guarantee of privacy. In our age of big data and machine learning, combinations of data features can still be used to identify individuals accurately. A newer approach is offered by privacy-preserving techniques called “differential privacy.”
In our research, we developed a differentially-private, distributed deep-learning framework. We then tested it by constructing federated fraud detection models for six fictitious banks using differentially-private model parameters. The original data was anonymized and provided by a single bank, a research partner. Each simulated bank constructed a local model for credit card fraud detection and did not share transaction or customer data with the central server or other banks.
Using our privacy preserving techniques, the banks shared learning gradients that had been modified with small amounts of random noise – akin to scrambling a signal. Our privacy preserving technique created strong fraud detection models more quickly and with the same accuracy as models that were constructed without privacy preservation. This research has far-reaching implications in a world where collaborating and sharing data safely while maintaining privacy will accelerate innovation of all kinds.