Make OpenAI’s models misbehave and earn a reward

OpenAI’s public Safety Bug Bounty program focuses on AI abuse and safety risks across its products. The goal is to support safe and secure systems and reduce the risk of misuse that could lead to harm.

This program complements the Security Bug Bounty. It accepts reports of abuse and safety risks that do not meet the criteria for a security vulnerability. Submissions are reviewed by teams from both programs based on scope and ownership.

OpenAI Safety Bug Bounty

Safety Bug Bounty program overview

The program focuses on AI-specific scenarios such as agentic risks, including MCP, exposure of OpenAI proprietary information, and risks to account and platform integrity.

Agentic risks include cases where attacker-controlled text can hijack an agent, such as a browser-based agent or a ChatGPT agent. The agent may then perform harmful actions or expose sensitive user information. The behavior must be reproducible at least half of the time.

An agentic OpenAI product may perform disallowed actions on OpenAI’s website at scale. It may also carry out other harmful actions that are not explicitly listed, as long as the harm is plausible and material. Testing for MCP risk must comply with the terms of service of relevant third parties.

OpenAI proprietary information risks include cases where model outputs reveal internal reasoning or other confidential information. This also includes vulnerabilities that expose additional proprietary information.

Account and platform integrity risks include weaknesses in systems that enforce rules and protect accounts. These may involve bypassing anti-automation measures, manipulating trust signals, or evading restrictions such as suspensions or bans. Issues that allow access to features, data, or functionality beyond authorized permissions should be reported through the Security Bug Bounty program.

“While jailbreaks are out of scope for this program, we periodically run private bug bounty campaigns focused on certain harm types, such as Biorisk content issues in ChatGPT Agent⁠ and GPT‑5⁠. We invite interested researchers to apply to these programs when they arise,” the company explained in a blog.

Researchers may receive rewards when they identify issues that could lead to user harm and provide steps to fix them. Reports that show general content policy bypasses without safety or abuse impact are not in scope. Issues that are easy to find or already widely known are also excluded.

More about

Make OpenAI’s models misbehave and earn a reward

Safety Bug Bounty program overview

Featured news

Resources

Don't miss