PyRIT: Open-source framework to find risks in generative AI systems

Python Risk Identification Tool (PyRIT) is Microsoft’s open-source automation framework that enables security professionals and machine learning engineers to find risks in generative AI systems.

PyRIT

PyRIT has been battle-tested by Microsoft’s AI red team. It started as a collection of individual scripts used during the team’s initial foray into red teaming generative AI systems in 2022. As they engaged with various generative AI systems and explored different risks, they incorporated new features they deemed beneficial.

The tool should not be seen as a substitute for the manual red teaming of generative AI systems. Instead, it enhances the current domain expertise of an AI red teamer by automating the more mundane tasks. PyRIT helps identify potential risk areas, enabling security professionals to delve precisely into these critical spots.

“The biggest advantage we have found so far using PyRIT is our efficiency gain. For instance, in one of our red teaming exercises on a Copilot system, we were able to pick a harm category, generate several thousand malicious prompts, and use PyRIT’s scoring engine to evaluate the output from the Copilot system all in the matter of hours instead of weeks,” wrote Ram Shankar Siva Kumar, Microsoft AI Red Team Lead.

PyRIT enables researchers to refine and enhance their defenses against various harms. For instance, Microsoft uses the tool to iterate on different product versions (and its associated metaprompt), aiming to protect against prompt injection attacks.

PyRIT goes beyond just a tool for generating prompts. It adapts its strategy based on the feedback from the generative AI system, creating subsequent inputs for the AI system. This process of automation persists until the security professional achieves their targeted objective.

“PyRIT is ideal for those well-versed in AI security, offering a robust platform to enhance and scale their processes. However, beginners or intermediates may find it overly complex and not fully benefit from its capabilities. The significant benefit of this approach lies in its innovative template for generating fresh attack strategies derived from a model’s responses. This enables the execution of multi-prompt attacks. The well-conceived attack templates and the mechanism for integrating new attacks enhance its effectiveness. Additionally, including a database to catalog the history of attacks and responses is a noteworthy feature,” Alex Polyakov, CEO of Adversa AI, told Help Net Security.

“Red teaming GenAI is important because companies don’t want their AI systems manipulated by bad actors to say or take actions that would harm the company. PyRIT solves a problem many people struggle with. It will be most helpful for teams with the bandwidth to learn and set up a new framework. This isn’t a replacement for manual testing by human red teamers. Still, it’s a way to automate some testing so you can quickly iterate on prompts and other configurations to find the balance of safety and utility,” said Joseph Thacker, principal AI engineer and security researcher at AppOmni.

PyRIT is available for free on GitHub.

Must read:

OPIS

Subscribe to the Help Net Security breaking news e-mail alerts:

OPIS

Don't miss