OpenWPM: An automated, open source framework for measuring web privacy

Among the speakers at the first ever Privacy Con, organized by the US Federal Trade Commission (FTC) and held last Thursday in Washington, DC, was Steven Englehard, a Ph.D. candidate at Princeton University’s department of computer science and a graduate research fellow at the Center for Information Technology Policy.

In his talk that was part of the session named The Current State of Online Privacy, Englehardt shared details about an open source web measurement platform he and his Princeton colleagues developed, which is already being used for research by students, regulators, journalists and others.

OpenWPM – as the platform is called – allows online tracking measurement.

“Our goal in developing OpenWPM is to decrease the initial engineering cost of studies and make running a measurement as effortless as possible. It has already been used in several published studies from multiple institutions to detect and reverse engineer online tracking,” Englehardt noted.

He explained how difficult it usually is to find technical solutions that will allow researchers to track privacy violations over a long period of time – something that has proved crucial for ultimate changes in policies and for new technical and legal solutions that would improve privacy for all of us.

“OpenWPM also makes it possible to run large-scale measurements with Firefox, a real consumer browser,” he noted, and explained: “Crawling with a real browser is important for two reasons: (1) it’s less likely to be detected as a bot, meaning we’re less likely to receive different treatment from a normal user, and (2) a real browser supports all the modern web features (e.g. WebRTC, HTML5 audio and video), plugins (e.g. Flash), and extensions (e.g. Ghostery, HTTPS Everywhere). Many of these additional features play a large role in the average user’s privacy online.”

Englehardt and his colleagues are currently using OpenWPM for checking tracking techniques and privacy issues on 1 million sites each month.

“With it, we will be able to detect and measure many of the known privacy violations reported by researchers so far: the use of stateful tracking mechanisms, browser fingerprinting, cookie synchronization, and more,” he says. This research isn’t over yet, but will soon be.

In the meantime, he has invited researchers to use the OpenWPM for their own research. The framework’s code can be found on GitHub, and more technical details about it in this paper.