Roblox chat moderation gets bypassed by leet speak and code words

Roblox runs an automated chat filter at the scale of billions of messages per day. An independent audit of about two million chat messages from four of the platform’s most popular games shows that filter missing a wide range of harmful interactions, including grooming attempts, sexual content directed at minors, threats of violence, and references to self-harm.

Roblox chat moderation

Researchers from the University of Arizona and Arizona State University collected the messages from public servers covering the 9+ and 13+ age tiers. The dataset spans 105,214 users and 336 hours of recorded gameplay. Roblox does not offer an API for chat data, so the team captured the in-game chat window on video and transcribed it with optical character recognition.

What got through

The audit groups escaped content into categories that mirror Roblox’s community standards. Grooming was the most common pattern in the reviewed sample. Examples include users coaxing other players to share their location, age, or images, and steering conversations toward in-person meetings. One exchange documented in the paper involves a user disclosing a home address and another responding with plans to meet that day. A separate thread shows panic after a player realized a stranger had obtained their location.

Sexual content also appeared in volume, including solicitation, sexting, and roleplay that escalated into explicit territory. Bullying, racial harassment, and slurs surfaced regularly, sometimes with only individual words masked and the surrounding meaning intact. The reviewers also found self-harm statements, violent threats described in graphic terms, and attempts to move conversations to TikTok, Discord, Snapchat, and YouTube where Roblox’s filter no longer applies.

Roblox runs context-aware AI moderation that goes beyond a simple keyword blocklist The audit found the system is good at masking isolated profane words and does sometimes redact at the phrase level. Harm that builds across multiple turns tends to pass through.

How users work around the filter

The researchers reviewed 12,612 messages from 94 users who had been moderated at least once before, looking at what those users did next. Six recurring evasion techniques came up.

Users split blocked phrases across several short messages so each line on its own looks harmless. They retry filtered words with new spellings, phonetic substitutions, or added punctuation. They use code words and abbreviations, including shorthand like “f4” for the f-word and “btc” for a common slur. They swap letters for numbers or symbols in the style of older internet leetspeak. They probe the filter with variations to learn what passes, then reassemble the original meaning once they find a form that goes through. One sequence in the paper shows a user testing several spellings of “Discord” before landing on a description the filter let through.

The distribution of flagged messages is heavily skewed. A small group of users accounts for the bulk of moderated content and keeps trying new bypass methods after each block. Moderation decisions appear to operate on individual messages without much memory of a user’s earlier behavior.

Pressure on the platform

Roblox is already defending itself in court on related claims. Seitz v. Roblox was filed in federal court in Kentucky in October 2025, and Los Angeles County’s People v. Roblox followed in February 2026. Both cite the platform’s handling of child safety. Recent reporting in Bloomberg, The Times, and others has described grooming cases that began on Roblox and moved to other apps before causing real-world harm.

The audit’s authors recommend that platforms serving minors combine pattern-based detection with language models, evaluate full conversations rather than single messages, track repeat offenders across games and sessions, and give users clearer feedback when reports lead to action. The findings cover only public servers, so private-server activity sits outside the scope of the study. The numbers reported in the paper are described as a lower bound on what is actually getting through.

Don't miss