Uncovering a privacy-preserving approach to machine learning

In the era of data-driven decision making, businesses are harnessing the power of machine learning (ML) to unlock valuable insights, gain operational efficiencies, and solidify competitive advantage.

ML models

Although recent developments in generative artificial intelligence (AI) have raised unprecedented awareness around the power of AI/ML, they have also illuminated the foundational need for privacy and security. Groups like IAPP, Brookings, and Gartner’s recent AI TRiSM framework have outlined key considerations for organizations looking to achieve the business outcomes uniquely available through AI without increasing their risk profile.

At the forefront of these imperatives is ML model security. Directly addressing this key area, privacy-preserving machine learning has emerged as a path to ensure that users can capitalize on the full potential of ML applications in this increasingly important field.

Using machine learning to generate insights

Machine learning models are algorithms that process data to generate meaningful insights and inform critical business decisions. What makes ML remarkable is its ability to continuously learn and improve. When a model is trained on new and disparate datasets, it becomes smarter over time, resulting in increasingly accurate and valuable insights that were previously inaccessible. These models can then be used to generate insights from data, which is referred to as model evaluation or inference.

To deliver the best outcomes, models need to learn and/or be leveraged over a variety of rich data sources. When these data sources contain sensitive or proprietary information, using them for machine learning model training or evaluation/inference raises significant privacy and security concerns. Any vulnerability of the model itself becomes a liability for the entity using it, meaning this capability that promised to deliver business-enhancing, actionable insights is now increasing the organization’s risk profile.

This issue is one of the main barriers preventing broader use of ML today. Businesses are challenged with balancing the benefits of ML with the need to protect their interests and comply with ever-evolving privacy and regulatory requirements.

Vulnerabilities in ML models

Vulnerabilities in ML models typically lead to two macro categories of attack vectors: model inversion and model spoofing.

Model inversion attacks involve targeting the model itself to reverse engineer back to the data over which it was trained — data that is likely sensitive and therefore valuable to the attacker. This could include personally identifiable information (PII), intellectual property (IP), and other sensitive or regulated information that, if exposed, could wreak havoc upon the organization.

Model spoofing, on the other hand, represents a form of adversarial machine learning wherein an attacker attempts to deceive the model by manipulating the input data in such a manner that the model makes incorrect decisions aligned with the attacker’s intentions. This process involves carefully observing or “learning” the behavior of the model and subsequently altering the input data (in a manner that is often imperceptible) to trick the model into making decisions that are advantageous to their objectives. Both of these attacks target vulnerabilities related to model weights, an essential part of an ML model. As such, the critical need to prioritize model weight protection was highlighted during the recent White House-convened discussion on AI risk.

Using privacy enhancing technologies

Privacy-preserving machine learning uses advances in privacy enhancing technologies (PETs) to address these vulnerabilities head on. PETs are a family of technologies that preserve and enhance the privacy and security of data throughout its processing lifecycle, uniquely enabling secure and private data usage. These powerful technologies allow businesses to encrypt sensitive ML models, run and/or train them, and extract valuable insights while eliminating the risk of exposure. Businesses can securely leverage disparate data sources, including across organizational boundaries and security domains, even when there are competitive interests involved.

Two notable pillars of the PETs family that enable secure and private ML are homomorphic encryption and secure multiparty computation (SMPC).

Homomorphic encryption is a technology that enables businesses to perform encrypted computations on data, thereby preserving the privacy of content of the search or analytic. By homomorphically encrypting ML models, organizations can run or evaluate them over sensitive data sources without exposing the underlying model data, allowing models trained on sensitive data to be leveraged outside the confines of their trusted environment.

Using SMPC, organizations can train models in an encrypted capacity by protecting the model development process, the data used for training, and the interests and intent of the parties involved. Models can be collaboratively trained on sensitive data without the risk of exposure. This approach to training models ensures privacy, security, and confidentiality while harnessing the collective power of diverse datasets to enhance the accuracy and effectiveness of machine learning models.

Conclusion

The increasing reliance on machine learning to enhance business activity is not a passing trend — and neither are the significant risks associated with ML models. Once the core value that AI/ML can deliver for the organization is established, constructing and instrumenting for security, risk, and governance is the next step towards adoption.

Advancements in PETs are providing a promising path forward. Privacy-preserving machine learning uniquely enables organizations to securely unlock the full potential of ML while upholding privacy, complying with regulatory directives, and safeguarding sensitive data. By embracing this security-forward approach, organizations can navigate the data-driven landscape with confidence, harnessing valuable insights while maintaining the trust of customers and stakeholders.

Don't miss