Veritone secures AI data with automated PII removal

Veritone deploys Veritone Redact with Veritone Data Refinery (VDR) to remove personally identifiable information (PII) and sensitive data before processing, enabling AI-ready data while protecting intellectual property (IP) and data owner rights.

As the scale and stakes for AI deployments and applications put pressure on enterprises and hyperscalers alike to ensure AI training data is properly licensed with PII and other sensitive data removed, VDR is designed to help ensure the data is clean from the outset. This helps companies meet strict industry compliance and privacy standards, as well as allowing a broader range of companies to innovate and compete in the AI space.

“We are committed to helping data-driven organizations protect their valuable assets and help ensure that their data is used cleanly and ethically,” said Ryan Steelberg, CEO of Veritone.

“We’re proud to use our own proprietary, AI-enabled tool, Redact, which is traditionally used by our public sector customers, including the Department of Justice, and state and local police agencies, to help ensure PII is safeguarded before going through refinement. This is a prime example of our dedication to providing innovative solutions that the market needs while fostering a more responsible and ethical AI ecosystem for everyone,” Steelberg added.

Veritone Redact is an application for public safety and law enforcement agencies that automates the process of redacting sensitive information, including PII, from audio, video, and image-based evidence. The application reduces manual redaction processes while increasing accuracy, minimizing errors, and helping agencies meet important deadlines.

Last year, Veritone announced new enhancements to Redact, including AI-powered voice masking, inverse blur, and transcription capabilities in 64 languages, improving how organizations approach audio and video redaction, addressing critical privacy, compliance, and productivity needs across legal, law enforcement, and corporate environments.

According to the Stanford HAI 2025 AI Index Report, citing Epoch AI research, training datasets are doubling every eight months as model scale continues to grow, increasing the volume of data flowing into AI systems. The race to train models on datasets has raised pressing concerns about the legal and ethical risks. In an audit of 1,800+ text datasets, arXiv, a community of third-party collaborators supported by staff at Cornell University, found frequent miscategorization of licenses on widely used dataset hosting sites, with license omission of over 70%.

Additionally, Veritone is seeing demand for VDR from both content owners and hyperscalers, with its volume of data processed increasing by 3.5 times in the second half of 2025 compared to the first half of 2025, reinforcing that the need for compliant, and ethically sourced AI-ready datasets is higher than before.

More about

Don't miss