Browsing histories can be used to compile unique browsing profiles, which can be used to track users, Mozilla researchers have confirmed.
There are also many third parties pervasive enough to gather web histories sufficient to leverage browsing history as an identifier.
This is not the first time that researchers have demonstrated that browsing profiles are distinctive and stable enough to be used as identifiers.
Sarah Bird, Ilana Segall and Martin Lopatka were spurred to reproduce the results set forth in a 2012 paper by Lukasz Olejnik, Claude Castelluccia, and Artur Janc, by using more refined data, and they’ve extend that work to detail the privacy risk posed by the aggregation of browsing histories.
The Mozillians collected browsing data from ~52,000 Firefox for 7 calendar days, then paused for 7 days, and then resumed for an additional 7 days. After analyzing the collected data, they identified 48,919 distinct browsing profiles, of which 99% are unique. (The original paper observed a set of ~400,000 web history profiles, of which 94% were unique.)
“High uniqueness holds even when histories are truncated to just 100 top sites. We then find that for users who visited 50 or more distinct domains in the two-week data collection period, ~50% can be reidentified using the top 10k sites. Reidentifiability rose to over 80% for users that browsed 150 or more distinct domains,” they noted.
The also confirmed that browsing history profiles are stable through time – a second prerequisite for these profiles being repeatedly tied to specific users/consumers and used for online tracking.
“Our reidentifiability rates in a pool of 1,766 were below 10% for 100 sites despite a >90% profile uniqueness across datasets, but increased to ~80% when we consider 10,000 sites,” they added.
Finally, some corporate entities like Alphabet (Google) and Facebook are able to observe the web to an even greater extent that when the research for the 2012 paper was conducted, which may allow them to gain deep visibility into browsing activity and use that visibility for effective online tracking – even if users use different devices to browse the internet.
Other recent research has shown that anonymization of browsing patterns/profile through generalization does not sufficiently protect users’ anonymity.
Regulation is needed
Privacy researcher Lukasz Olejnik, one of the authors of the 2012 paper, noted that the findings of this newest research are a welcome confirmation that web browsing histories are personal data that can reveal insight about the user or be used to track users.
“In some ways, browsing history resemble biometric-like data due to their uniqueness and stability,” he commented, and pointed out that, since this data allows the singling-out of individuals out of many, it automatically comes under the General Data Protection Regulation (GDPR).
“Web browsing histories are private data, and in certain contexts, they are personal data. Now the state of the art in research indicates this. Technology should follow. So too should the regulations and standards in the data processing. As well as enforcement,” he concluded.