How AI image tools can be tricked into making political propaganda

A single image can shift public opinion faster than a long post. Text to image systems can be pushed to create misleading political visuals, even when safety filters are in place, according to a new study.

AI generated political propaganda

The researchers examined whether commercial text to image tools can be tricked into producing politically sensitive images of actual public figures. They focused on scenes that could be used for propaganda or disinformation, such as elected leaders holding extremist symbols or performing gestures tied to hate movements.

Tests were carried out on GPT-4o, GPT-5, and GPT-5.1, using the gpt-image-1 image generator through standard web interfaces.

A gap in political safeguards

Text to image platforms rely on layered defenses. Prompt filters screen user input, while image filters review the output. These controls work well for sexual or violent imagery. Political content is handled differently. The study shows that filters often judge political risk by scanning language for known names, symbols, and relationships.

The researchers built a benchmark of 240 prompts involving 36 public figures. Each prompt described a politically charged scene that could plausibly spread false narratives. When submitted in plain English, every one of these prompts was blocked. The pass rate was 0 percent.

That baseline result shows that political filtering works in straightforward cases. The rest of the study explores what happens when the same intent is expressed in less direct ways.

Preserving identity without naming it

The attack method starts by replacing explicit political names and objects with indirect descriptions. A public figure becomes a short profile that hints at appearance and background. A symbol becomes a historical description without naming it. The goal is to preserve visual identity while avoiding keywords that trigger filters.

These descriptions remain understandable to the image model. At the same time, they reduce the chance that keyword based systems will flag the prompt. This step alone is not enough. Earlier work showed that detailed descriptions can still be pieced together by semantic filters.

The next step changes the outcome.

Political meaning depends on context

Topics that draw strong reactions in English can carry less weight in other linguistic contexts. Safety filters appear to reflect that imbalance.

The researchers translated each descriptive fragment into dozens of languages, then combined them in ways that spread political meaning across unrelated contexts. One part of a prompt might appear in Swahili, another in Thai, and another in Uzbek.

This fragmentation disrupts how filters piece meaning together. Political harm often depends on relationships between entities. A leader and a symbol may seem harmless on their own. Taken together, they create controversy. When those elements are split across languages, filters struggle to make the connection.

Measuring political sensitivity by language

Language selection was not random. The researchers designed scoring methods to estimate how politically sensitive a translated description might be. These scores used public knowledge sources and semantic similarity measures.

Languages with lower political association scores were favored, as long as the meaning stayed intact. Back translation checks removed options that distorted the original description too far.

This process produced multilingual prompts that still conveyed the intended scene to the image generator. At the same time, they weakened the filter’s ability to recognize political risk.

Language choice drives attack success

Using carefully selected multilingual prompts, the attack succeeded in up to 86% of cases on one widely used interface. Other tested systems reached success rates of 68% and 76%.

A random language mixing strategy performed far worse, with success rates below 32% across all models. The difference points to language selection as a key factor.

The analysis also separates outcomes by prompt type. Prompts centered on political symbols reached success rates above 87% on one model. Phrase based prompts, which describe actions or statements, proved more difficult but still exceeded 50% on the most restrictive system.

Consistency across countries

The benchmark included prompts tied to leaders from major economies. When grouped by country, success rates stayed high across regions. Prompts involving leaders from the United States, Japan, Germany, and the United Kingdom all showed strong bypass rates, often above 80% on at least one system.

This consistency indicates that the issue is not tied to a single political context. It reflects a broader limitation in multilingual safety logic.

Defensive measures and their cost

The team also tried a few ways to stop the attack. One option was to force every prompt back into the language most closely tied to the political subject. That cut the success rate to about 14 to 18%, though it did not close the gap entirely.

A harder line worked better. Adding a strict system instruction shut down every attempt to generate the images. The downside was immediate. Ordinary political requests were caught in the net as well. In some cases, every benign prompt tested was rejected.

Tighter rules reduced misuse, but they also blocked ordinary requests, limiting how selectively the systems could respond.

Don't miss