What security teams can learn from torrent metadata

Security teams often spend time sorting through logs and alerts that point to activity happening outside corporate networks. Torrent traffic shows up in investigations tied to policy violations, insider risk, and criminal activity. A new research paper looks at that same torrent activity through an open source intelligence lens and asks how much signal security teams can extract from data that is already public.

torrent metadata OSINT

Data pipeline design

Turning torrent metadata into intelligence

Torrent files contain descriptive information such as file names, tracker URLs, and cryptographic hashes. Trackers return lists of peers connected to a specific file, including IP addresses and ports.

The study collected metadata from The Pirate Bay and public UDP trackers across 206 popular torrents. The resulting dataset included more than 60,000 unique IP addresses. Each IP address was enriched using public services that provide geolocation, ISP ownership, autonomous system data, and indicators tied to VPNs or hosting providers.

An external monitoring database was also used to flag IP addresses with prior links to child exploitation material. The researchers avoided collecting illicit content directly and relied on cross referencing existing public flags.

Giuseppe Cascavilla, Assistant Professor at Tilburg University and co-author of the study, told Help Net Security that the reliance on UDP tracker data was a deliberate design choice for the proof of concept, even though it introduced visibility limits. He explained that expanding the collection to include large scale DHT scraping would mainly affect coverage.

“Adding DHT scraping would increase recall by capturing trackerless and more evasive peers who deliberately avoid centralized infrastructure,” Cascavilla said. “This would likely reinforce the observed relationship between anonymization and higher risk behavior and produce denser network structures with more bridging peers.”

Cascavilla added that the current findings should be interpreted as a conservative snapshot of activity that is already observable through public trackers.

“The behavioral clusters and risk signals identified through UDP trackers are expected to persist,” he said. “The current results represent a lower bound on observable activity.”

A structured OSINT workflow

The research follows a five stage OSINT process that includes source identification, collection, processing, analysis, and reporting. Torrent indexes supplied file metadata and uploader details. Trackers supplied peer lists. IP intelligence services added geographic and infrastructure context.

Data processing focused on cleaning inconsistent fields and standardizing ISP and location names. Privacy indicators were derived from metadata fields and keyword analysis of infrastructure ownership.

Analysis relied heavily on network graphs. Bipartite graphs linked IP addresses to torrents. These graphs were then projected into IP to IP networks based on shared participation in the same swarm. Content to content graphs connected files downloaded by the same users.

Patterns tied to higher risk behavior

The dataset showed consistent use of privacy services among IP addresses associated with higher risk signals. About one fifth of all observed IPs showed VPN or proxy indicators. Among IPs flagged for child exploitation material, privacy usage exceeded three quarters.

Geographic clustering appeared across multiple analyses. High frequency peer relationships often aligned with regional groupings, with cross border links appearing across popular content categories.

A focused case study examined a set of e books uploaded in 2013 that covered explosives, weapons, and related topics. These torrents still attracted active peers more than a decade later. Users who downloaded multiple titles in this category also showed distinct behavioral patterns when mapped through network analysis.

Some IP addresses displayed concentrated activity around sensitive material with limited overlap into mainstream downloads. Others showed mixed interest across instructional material and popular media. These patterns emerged through co download relationships rather than file names alone.

Drawing operational boundaries

Cascavilla said the research team recognizes that anonymization tools are widely used across torrent ecosystems, including by users engaged in routine file sharing. He said the proposed system focuses on behavior over time rather than one off signals.

“By collecting data on a daily basis, it becomes possible to profile specific users through repeated activity patterns,” Cascavilla said. “The system is designed to operate as one component within a broader investigative process that includes additional procedures and intelligence work.”

He noted that some users involved in illegal content distribution continue to operate without VPN services. These users often appear within tightly connected networks tied to specific content categories.

“These users form identifiable clusters linked to illegal material,” Cascavilla said. “Through this approach, investigators can identify those users, collect IP level information, and verify involvement in specific distribution patterns such as child exploitation material.”

Cascavilla emphasized that content targeting remains a key requirement. Keyword driven discovery and focused torrent selection play a role in narrowing the analysis to relevant material.

“The proof of concept demonstrates the feasibility of analyzing torrent files at scale,” he said. “Operational use requires integration with investigative intelligence that guides which torrents and users should be examined.”

Limits and future work

The authors acknowledge practical constraints around manual collection and reliance on UDP trackers. Automation and broader DHT coverage would expand visibility and scale. The case study scope focused on a narrow content category to manage legal and ethical risk.

Future work points toward automated pipelines that integrate torrent metadata into broader OSINT platforms. For security teams already monitoring peer to peer activity, the research offers a framework for extracting more context from data that often sits at the edge of investigations.

Don't miss