Google has open-sourced a differential privacy library that helps power some of its core products.
What it differential privacy?
Differential privacy is a method for analyzing data contained in a database and providing helpful insight from it, without disclosing the actual information contained in the data to the analysts. It’s meant to keep sensitive information usable but thoroughly anonymized.
“Differentially-private data analysis is a principled approach that enables organizations to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual’s data to be distinguished or re-identified,” noted Miguel Guevara, Product Manager, Privacy and Data Protection Office at Google.
“This type of analysis can be implemented in a wide variety of ways and for many different purposes. For example, if you are a health researcher, you may want to compare the average amount of time patients remain admitted across various hospitals in order to determine if there are differences in care.”
Using the library
Google uses the library to, for example, provide Google Maps users information about how busy a restaurant is over the course of the day.
The open-sourced library and the accompanying interface can be used by developers in a wide variety of sectors and for a wide variety of helpful features.
“Most common data science operations are supported by this release. Developers can compute counts, sums, averages, medians, and percentiles using our library,” Guevara shared, and noted that they designed the library so that it can be extended to include other functionalities such as additional mechanisms, aggregation functions, or privacy budget management.
“The real utility of an open-source release is in answering the question ‘Can I use this?’ That’s why we’ve included a PostgreSQL extension along with common recipes to get you started,” he added.
Why use this library?
The library has been released under the Apache License, meaning that developers can freely use it, distribute it, modify it and distribute modified versions of it under the terms of the license.
The difference between this Google differential privacy implementation and other existing ones is that this one can work with a database that includes multiple records per user.
Google privacy software engineer Damien Desfontaines pointed out additional pros in a Twitter thread:
OK so why am I so excited about this release? So many reasons! (A thread.)
Note before I start: all of what follows is my opinion, not my employers' =)https://t.co/mS8D6rKBDg
— Ted (@TedOnPrivacy) September 5, 2019
The most important thing about this release is that Google also provided a stochastic tester to help spot implementation glitches and problems that could make the differential privacy property no longer hold. This will allow developers to make sure their implementation works as it should.
Google is looking for feedback on the library from academic and technical communities around the world but, for the time being, will not accept pull requests.