OpenAI tackles a bad habit people have when interacting with AI

Since people tend to paste personal data into AI tools such as ChatGPT, OpenAI has released Privacy Filter, an open-weight model designed to detect and redact personally identifiable information (PII) in text. The model is available under the Apache 2.0 license on Hugging Face and GitHub.

OpenAI Privacy Filter

“This release is part of our broader effort to support a more resilient software ecosystem by providing developers with practical infrastructure for building with AI safely, including tools and models that make privacy and security protections easier to implement from the start,” the company said in the announcement.

OpenAI claims it uses a fine-tuned version of Privacy Filter in its own privacy-preserving workflows.

Privacy Filter is designed to analyze language and context to improve how it identifies sensitive information. It can detect a broader range of PII in unstructured text, including cases where classification depends on context. The model distinguishes between public information and data linked to private individuals, reducing unnecessary redaction while still masking sensitive details.

“The model is small enough to be run locally–meaning data that has yet to be filtered can remain on device, with less risk of exposure, rather than needing to be sent to a server for de-identification,” OpenAI wrote.

Privacy Filter sorts sensitive data into eight categories, including names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets. The account number category covers things like credit cards and bank accounts, while secrets include passwords and API keys.

The system uses a token classification approach, labeling input text in a single pass rather than generating it token by token. It supports long documents with a context window of up to 128,000 tokens. While it has 1.5 billion total parameters, only about 50 million are active during use, which helps improve speed.

OpenAI tested the model on the PII-Masking-300k benchmark, which measures how well systems detect and mask personal data. It reported an F1 score of 96%, with 94.04% precision and 98.04% recall. On a revised version of the dataset, the score increased to 97.43%, with 96.79% precision and 98.08% recall.

The model can also be adapted to specific domains, with OpenAI noting that fine-tuning on smaller datasets can improve performance.

“Like all models, Privacy Filter can make mistakes. It may miss uncommon identifiers or ambiguous references, and it can over- or under-redact information when context is limited, especially in shorter text. In high-sensitivity areas such as legal, medical, and financial workflows, human review and domain-specific evaluation and fine-tuning remain important,” OpenAI warned.

There is no doubt that this solution will help address common privacy risks in AI workflows and may prevent personal data from ending up where it doesn’t belong.

Don't miss