Generative AI: The new attack vector for trust and safety

Threat actors are abusing generative AI to carry out child sex abuse material (CSAM), disinformation, fraud and extremism, according to ActiveFence.

generative AI abuse

“The explosion of generative AI has far-reaching implications for all corners of the internet,” said Noam Schwartz, CEO and founder of ActiveFence.

“We’ve identified three key areas of concern. First, we’re seeing that threat actors are now able to accelerate and amplify their operations, leading to an unprecedented mass production of malicious content. Second, these same actors are exploring ways to exploit generative AI, manipulating these models and revealing their inherent vulnerabilities. Finally, these evolving threats place increased pressure on digital platforms to improve the precision and efficiency of their data training protocols,” Schwartz continued.

Key ways to abuse generative AI:

Creation of child sex abuse material, ranging from visual images to erotic narratives
Generation of fraudulent, AI-generated images that are deceiving millions
Production of deepfake audio files that tout extremism

CSAM

Researchers tracked a 172% increase in the volume of shared CSAM produced by generative AI in the first quarter of this year. It also detected a poll conducted by administrators of a closed child predator forum in the dark web, which surveyed almost 3,000 predators about their use of generative AI.

The poll revealed that 78% of respondents have or plan to use generative AI for CSAM, and the remaining 22% said they had plans to try the technology. These predator forums leverage generative AI algorithms to produce sexual images as well as textual descriptions, stories and narratives.

In one observed instance, when asked to write an erotic story involving two minors, a major generative AI platform refused, calling the request “inappropriate and potentially illegal.” But when the same question was made with just a few altered words, the algorithm produced an erotic story, describing an adult male who inappropriately watched two young boys swimming.

Child predators are also using generative AI to create tutorials of their creations, which helps them gain credibility within the predator community, encourage others to replicate their efforts, and share recommended phrases and keywords to evade platform safeguards.

To bypass these platform limitations, researchers detected predators making requests in different languages, using alternative and suggestive terms, and manipulating the AI algorithm with various prompts, inputs and dedicated models.

Disinformation and fraudulent content

While fraud and disinformation are not new concepts, generative AI has allowed threat actors to create fraudulent images more quickly, accurately and with a higher reach.

One AI-generated image that ActiveFence detected on Telegram falsely shows Russian President Vladimir Putin kneeling before Chinese President Xi Jinping, begging for his support in the Ukraine conflict.

Researchers identified several key generative AI signifiers of this image: obscured faces, blurred hands, distorted pieces of furniture and a lack of photography attribution.

Despite these indicators, the misleading content generated a reach of 10 million users.

To demonstrate how threat actors manipulate generative AI chatbots for malicious purposes, researchers detected methods used to override several policies of major generative AI platforms.

In one case, exploiters were able to produce a generative AI phishing email, and in another, they successfully prompted a bot to write an inauthentic positive review of an app that is widely accessible on a major online marketplace.

While this example was positive, used maliciously, this tactic not only misleads a platform’s users but can also harm a platform’s credibility as a secure place for online activity.

Violent extremism

Researchers detected numerous instances where threat actors have exploited generative AI to create hyper-realistic yet harmful content that incites violence and promotes extremist propaganda. These threat actors are using generative AI to create racist, nationalist or extremist manifestos or speeches.

ActiveFence discovered an AI-generated deepfake audio file that exploited growing political and economic distress. This fabricated audio wrongly imitated a well-known UK reporter, inciting a rebellion against the British government.

The misleading manifesto provided instructions on procuring weapons from the underground market and urged an assault on the British national infrastructure.

More about

Generative AI: The new attack vector for trust and safety

CSAM

Disinformation and fraudulent content

Violent extremism

Featured news

Sponsored

Don't miss