Who’s tracking you online, and how?

Armed with a tool that mimics a consumer browser but is actually bent on discovering all the ways websites are tracking visitors, Princeton University researchers have discovered several device fingerprinting techniques never before seen in the wild.

The web privacy measurement tool is called OpenWPM, and has been open sourced. Its creators are the very same researchers who performed this latest study.

They crawled and analyzed measurements collected from 1 million of the most popular websites on the Internet, and found that tracking – both cookie-based and fingerprint-based – is plentiful and ubiquitous, but that the number of third parties (i.e. trackers) that a regular user encounters daily is relatively small.

“The effect is accentuated when we consider that different third parties may be owned by the same entity,” they noted. “All of the top 5 third parties, as well as 12 of the top 20, are Google-owned domains. In fact, Google, Facebook, and Twitter are the only third-party entities present on more than 10% of sites.”

Not wholly unexpectedly, news sites sport the most number of trackers, followed by arts, sports, home, games and shopping sites.

Level of tracking on different categories of websites

“Sites on the high end of the spectrum are largely those which provide editorial content. Since many of these sites provide articles for free, and lack an external funding source, they are pressured to monetize page views with significantly more advertising,” the researchers pointed out.

Another interesting revelation by this study is that device fingerprinting is becoming an increasingly popular tracking technique.

The use of the (HTML5) canvas element to fingerprint devices and consequently users is well known, but new techniques have arisen: font fingerprinting using canvas, and fingerprinting by abusing the WebRTC API, the Battery Status API, and the AudioContext API.

The first one involves device fingerprinting by recovering the browser’s font list via the canvas element. The second one is used to discover the visitor’s IP address.

The third one (very rarely used) checks the current battery level or charging status of the device. The fourth one consists of scripts that check for the AudioContext API’s existence and/or process an audio signal generated with the OscillatorNode API.

Device fingerprinting via the AudioContext API

Fingerprinting techniques are often used in conjunction, and that is what makes them effective at identifying individual users.

These newer fingerprinting techniques are also more difficult to spot by existing privacy tools.

The researchers have found that the Ghostery browser extension is particularly effective at reducing well known trackers (both cookies and canvas fingerprinting attempts). But more obscure trackers and fingerprinting techniques, like the ones mentioned above, still pass through.

“This makes sense given that the block list is manually compiled and the developers are less likely to have encountered obscure trackers,” they noted, adding that this type of research should help them fill it up.

Don't miss